Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace _CCCL_ALWAYS_INLINE with _CCCL_FORCEINLINE #2439

Merged
merged 5 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ AlwaysBreakTemplateDeclarations: Yes
AttributeMacros: [
'_CCCL_ALIGNAS_TYPE',
'_CCCL_ALIGNAS',
'_CCCL_ALWAYS_INLINE',
'_CCCL_CONSTEXPR_CXX14',
'_CCCL_CONSTEXPR_CXX17',
'_CCCL_CONSTEXPR_CXX20',
Expand Down
2 changes: 1 addition & 1 deletion cudax/include/cuda/experimental/__async/config.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ namespace cuda::experimental::__async
# define _CUDAX_ARTIFICIAL
#endif

#define _CUDAX_ALWAYS_INLINE _CCCL_ALWAYS_INLINE _CUDAX_ARTIFICIAL _LIBCUDACXX_NODEBUG inline
#define _CUDAX_ALWAYS_INLINE _CCCL_FORCEINLINE _CUDAX_ARTIFICIAL _LIBCUDACXX_NODEBUG

// GCC struggles with guaranteed copy elision of immovable types.
#if defined(_CCCL_COMPILER_GCC)
Expand Down
2 changes: 0 additions & 2 deletions libcudacxx/include/cuda/std/__cccl/execution_space.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,10 @@
# define _CCCL_HOST __host__
# define _CCCL_DEVICE __device__
# define _CCCL_HOST_DEVICE __host__ __device__
# define _CCCL_FORCEINLINE __forceinline__
#else // ^^^ _CCCL_CUDA_COMPILATION ^^^ / vvv !_CCCL_CUDA_COMPILATION vvv
# define _CCCL_HOST
# define _CCCL_DEVICE
# define _CCCL_HOST_DEVICE
# define _CCCL_FORCEINLINE inline
#endif // !_CCCL_CUDA_COMPILATION

#if !defined(_CCCL_EXEC_CHECK_DISABLE)
Expand Down
6 changes: 3 additions & 3 deletions libcudacxx/include/cuda/std/__cccl/visibility.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,15 +59,15 @@
#endif // !_CCCL_COMPILER_NVRTC

#if defined(_CCCL_COMPILER_MSVC)
# define _CCCL_ALWAYS_INLINE __forceinline
# define _CCCL_FORCEINLINE __forceinline
#else // ^^^ _CCCL_COMPILER_MSVC ^^^ / vvv _CCCL_COMPILER_MSVC vvv
# define _CCCL_ALWAYS_INLINE __attribute__((__always_inline__))
# define _CCCL_FORCEINLINE __inline__ __attribute__((__always_inline__))
#endif // !_CCCL_COMPILER_MSVC

#if __has_attribute(exclude_from_explicit_instantiation)
# define _CCCL_EXCLUDE_FROM_EXPLICIT_INSTANTIATION __attribute__((exclude_from_explicit_instantiation))
#else // ^^^ exclude_from_explicit_instantiation ^^^ / vvv !exclude_from_explicit_instantiation vvv
// NVCC complains mightily about being unable to inline functions if we use _CCCL_ALWAYS_INLINE here
// NVCC complains mightily about being unable to inline functions if we use _CCCL_FORCEINLINE here
# define _CCCL_EXCLUDE_FROM_EXPLICIT_INSTANTIATION
#endif // !exclude_from_explicit_instantiation

Expand Down
12 changes: 10 additions & 2 deletions thrust/testing/transform_input_output_iterator.cu
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,16 @@

#include <unittest/unittest.h>

// There is a unfortunate miscompilation of the gcc-11 vectorizer leading to OOB writes
// Adding this attribute suffices that this miscompilation does not appear anymore
#if defined(_CCCL_COMPILER_GCC) && __GNUC__ >= 11
# define THRUST_DISABLE_BROKEN_GCC_VECTORIZER __attribute__((optimize("no-tree-vectorize")))
#else
# define THRUST_DISABLE_BROKEN_GCC_VECTORIZER
#endif

template <class Vector>
void TestTransformInputOutputIterator()
THRUST_DISABLE_BROKEN_GCC_VECTORIZER void TestTransformInputOutputIterator()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes our tests, but won't gcc still be miscompiling Thrust for users?

Copy link
Collaborator

@miscco miscco Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is nothing we can change. I want to note that this is exceptionally frickle and dependent on exact sizes and optimization settings, so I dont see anything we can do there

{
using T = typename Vector::value_type;

Expand Down Expand Up @@ -52,7 +60,7 @@ void TestTransformInputOutputIterator()
DECLARE_VECTOR_UNITTEST(TestTransformInputOutputIterator);

template <class Vector>
void TestMakeTransformInputOutputIterator()
THRUST_DISABLE_BROKEN_GCC_VECTORIZER void TestMakeTransformInputOutputIterator()
{
using T = typename Vector::value_type;

Expand Down
Loading