Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace _CCCL_ALWAYS_INLINE with _CCCL_FORCEINLINE #2439

Merged
merged 5 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ AlwaysBreakTemplateDeclarations: Yes
AttributeMacros: [
'_CCCL_ALIGNAS_TYPE',
'_CCCL_ALIGNAS',
'_CCCL_ALWAYS_INLINE',
'_CCCL_CONSTEXPR_CXX14',
'_CCCL_CONSTEXPR_CXX17',
'_CCCL_CONSTEXPR_CXX20',
Expand Down
2 changes: 1 addition & 1 deletion cudax/include/cuda/experimental/__async/config.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ namespace cuda::experimental::__async
# define _CUDAX_ARTIFICIAL
#endif

#define _CUDAX_ALWAYS_INLINE _CCCL_ALWAYS_INLINE _CUDAX_ARTIFICIAL _LIBCUDACXX_NODEBUG inline
#define _CUDAX_ALWAYS_INLINE _CCCL_FORCEINLINE _CUDAX_ARTIFICIAL _LIBCUDACXX_NODEBUG

// GCC struggles with guaranteed copy elision of immovable types.
#if defined(_CCCL_COMPILER_GCC)
Expand Down
2 changes: 0 additions & 2 deletions libcudacxx/include/cuda/std/__cccl/execution_space.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,10 @@
# define _CCCL_HOST __host__
# define _CCCL_DEVICE __device__
# define _CCCL_HOST_DEVICE __host__ __device__
# define _CCCL_FORCEINLINE __forceinline__
#else // ^^^ _CCCL_CUDA_COMPILATION ^^^ / vvv !_CCCL_CUDA_COMPILATION vvv
# define _CCCL_HOST
# define _CCCL_DEVICE
# define _CCCL_HOST_DEVICE
# define _CCCL_FORCEINLINE inline
#endif // !_CCCL_CUDA_COMPILATION

#if !defined(_CCCL_EXEC_CHECK_DISABLE)
Expand Down
6 changes: 3 additions & 3 deletions libcudacxx/include/cuda/std/__cccl/visibility.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,15 +59,15 @@
#endif // !_CCCL_COMPILER_NVRTC

#if defined(_CCCL_COMPILER_MSVC)
# define _CCCL_ALWAYS_INLINE __forceinline
# define _CCCL_FORCEINLINE __forceinline
#else // ^^^ _CCCL_COMPILER_MSVC ^^^ / vvv _CCCL_COMPILER_MSVC vvv
# define _CCCL_ALWAYS_INLINE __attribute__((__always_inline__))
# define _CCCL_FORCEINLINE __inline__ __attribute__((__always_inline__))
#endif // !_CCCL_COMPILER_MSVC

#if __has_attribute(exclude_from_explicit_instantiation)
# define _CCCL_EXCLUDE_FROM_EXPLICIT_INSTANTIATION __attribute__((exclude_from_explicit_instantiation))
#else // ^^^ exclude_from_explicit_instantiation ^^^ / vvv !exclude_from_explicit_instantiation vvv
// NVCC complains mightily about being unable to inline functions if we use _CCCL_ALWAYS_INLINE here
// NVCC complains mightily about being unable to inline functions if we use _CCCL_FORCEINLINE here
# define _CCCL_EXCLUDE_FROM_EXPLICIT_INSTANTIATION
#endif // !exclude_from_explicit_instantiation

Expand Down
2 changes: 1 addition & 1 deletion thrust/thrust/detail/function.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ struct wrapped_function

_CCCL_EXEC_CHECK_DISABLE
template <typename... Ts>
_CCCL_FORCEINLINE _CCCL_HOST_DEVICE Result operator()(Ts&&... args) const
inline _CCCL_HOST_DEVICE Result operator()(Ts&&... args) const
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco at least locally, this change avoids the gcc optimizer issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is awesome. I will test whether we can avoid that hack in the other tests too. Will file a separate PR though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it fixed almost all of the failures tho. just two remain. i can track those down when i get back from vacation.

{
return static_cast<Result>(m_f(thrust::raw_reference_cast(::cuda::std::forward<Ts>(args))...));
}
Expand Down
Loading