Main #25

miscco · 2024-10-22T10:10:26Z

Its a pain to retrieve the address of a device symbol.

This adds a little helper that makes it less painfull

) * Improve binary function objects and replace thrust implementation * simplify use of ::cuda::std binary_function_objects * Replace _CCCL_CONSTEXPR_CXX14 with constexpr in all libcudacxx binary function objects that are imported in thrust. * Determine partial sum type without ::result_type * Ignore _LIBCUDACXX_DEPRECATED_IN_CXX11 for doxygen Co-authored-by: Bernhard Manfred Gruber <[email protected]>

…#1881)

* Add script to update RAPIDS version. * Update to 24.10.

* fix broken links * revert repo.toml * linkchecker fixes * fix .cuh errors * lint

* Add a header to interact with driver APIs * Add a test for the driver API interaction * Format * Fix formatting

* Use `common_type` for complex `pow` Previously we would rely on our internal `__promote` function. However, that could have surprising results, e.g. `pow(complexy<float>, int)` would return `complex<double>` With C++23, this situation got clarified and we should use `common_type` to determine the return type.

… cache properties (#2110)

… point fallbacks (#2106) Fixes #2078

* Drop `cuda::get_property` CPO It serves no purpose as it only ever forwards via ADL and also breaks older nvcc * Ensure that we test memory resources * Implement `cuda::uninitialized_buffer` `cuda::uninitialized_buffer` provides an allocation of `N` elements of type `T` utilitzing a `cuda::mr::resource` to allocate the storage. `cuda::uninitialized_buffer` takes care of alignment and deallocation of the storage. The user is required to ensure that the lifetime of the memory resource exceeds the lifetime of the buffer.

…ice (#2073) * Ensure that `cuda_memory_resource` allocates memory on the proper device * Move `__ensure_current_device` to own header

* Clarify compatibility wrt. template specializations We do not want users to specialize arbitrary templates in CCCL unless otherwise stated. This PR makes this clear in the README.md. Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Make `cuda::std::tuple` trivially copyable This is similar to the situation with `cuda::std::pair` We have a lot of users that rely on types being trivially copyable, so that they can utilize memcpy and friends. Previously, `cuda::std::tuple` did not satisfy this because it needs to handle reference types. Given that we already specialize `__tuple_leaf` depending on whether the class is empty or not, we can simply add a third specialization that handles the trivially copyable types and one that synthesizes assignment. Co-authored-by: Bernhard Manfred Gruber <[email protected]>

Also fix typo in the link

``` In function bool cuda::std::__4::__dispatch_memmove(_Up*, _Tp*, size_t) ... error: *(unsigned char*)(&privatized_decode_op[0]) may be used uninitialized [-Werror=maybe-uninitialized] ... *(unsigned char*)(&privatized_decode_op[0]) was declared here 1528 | PrivatizedDecodeOpT privatized_decode_op[NUM_ACTIVE_CHANNELS]{}; ```

* Fix flakey heterogeneous tests by ensuring only *one* writer exists in parallel between H/D * Fixup copy paste mistake * Make host atomics simpler by removing the ugly alignment type * Fix deadlocks introduced into barrier/semaphore tests * Revert removing hacky atomic wrapping stuff * Fix unused warning bug in GCC-6

``` Linking CXX executable bin/cub.cpp14.catch2_test.lid_0 FAILED: bin/cub.cpp14.catch2_test.lid_0 ... /usr/bin/ld: cub/test/CMakeFiles/cub.cpp14.test.warp_scan_api.dir/catch2_test_warp_scan_api.cu.o: in function `InclusiveScanKernel(int*)': /usr/local/cuda-12.7/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxInitDefs.h:473: multiple definition of `InclusiveScanKernel(int*)'; cub/test/CMakeFiles/cub.cpp14.test.block_scan_api.dir/catch2_test_block_scan_api.cu.o:/usr/local/cuda-12.7/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxInitDefs.h:468: first defined here collect2: error: ld returned 1 exit status ```

… in the system (#2100) * add `cuda::devices` vector the number of cuda devices can be determined by calling `cuda::devices.size()`. `cuda::devices` is a range of `cuda::device` objects.

* Fix trivial_copy_device_to_device execution space * Typo * Format * Extra empty line

Fixes: #1968

Fixes: #2165

In the devcontainers `clang-format` is now installed into `/usr/bin/clang-format`

Co-authored-by: Michael Schellenberger Costa <[email protected]>

… it in places where it was missing (#2192) * Change __scoped_device to use driver API * Switch to use driver API based dev setter * Remove constexpr from operator device() * Fix comments and includes * Fallback to non-versioned get entry point pre 12.5 We need to use versioned version to get correct cuStreamGetCtx. There is v2 version of it in 12.5, fortunatelly the versioned get entry point is available there too * Fix unused local variable * Fix warnings in ensure_current_device test * Move ensure current device out of detail * Add LIBCUDACXX_ENABLE_EXCEPTIONS to tests cmake

* add `_LIBCUDACXX_REQUIRES_EXPR` to the concepts emulation macros * work around nvcc pre-12.2 bug and molify nvrtc * silence warning about an always true condition * simplify macro substitution with the help of an alias template * fix the short-circuiting behavior of the `async_resource` concept pre-c++20 * replace C-style casts with C++-style `static_cast`s * add missing `_LIBCUDACXX_HIDE_FROM_ABI` function annotations * restore short-circuiting in the `resource` concept

Fixes #653. Adds three new workflows that can be manually triggered at various points in the release process: - `release-create-new`: Begin the release process for a new version. - Inputs: - `new_version`: The new version, eg. "2.3.1" - `branch_point`: Optional; If the `branch/{major}.{minor}.x` branch does not exist, create it from this SHA. If the release branch already exists, this is ignored. If not provided, the release branch is created from the current `main` branch. - Actions: - Creates release branch if needed. - Bumps version numbers with `update-version.sh` in topic branch. - Creates pull request to merge the topic branch into `main` - Marks the pull request for backporting to the release branch. - `release-update-rc`: Validate and tag a new release candidate. - Inputs: - `new_version`: The new version, eg. "2.3.1" - Actions: - Uses the HEAD SHA of the release branch for testing/tagging. - Determines the next rc for this version by inspecting existing tags. - Runs the `pull_request` workflow to validate the release candidate. This can be modified in the future to run a special rc acceptance workflow. - Tags the release candidate if the workflow passes. - `release-finalize`: Tag a final release. - Inputs: - `new_version`: The new version, eg. "2.3.1" - Actions: - Determines the most recent release candidate. - Tags the latest release candidate as the final release. [skip-matrix][skip-rapids][skip-vdc][skip-docs]

- Generate tarballs - Create draft release - Get ver info from GITHUB_REF (must be RC tag) [skip-matrix][skip-rapids][skip-vdc][skip-docs]

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

* ensure cupy arrays can be used with cuda.parallel too * copy the same comment to all places wherever applicable Co-authored-by: Michael Schellenberger Costa <[email protected]> * ensure all needed imports are shown in the example * add CuPy (+CUDA 12.x) as a test dependency --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Revert "Tweaks to slack notifs." This reverts commit 0d475ef. * Revert "Add slack notifications." This reverts commit 9551a53. * Revert "Use annotated tags." This reverts commit 5cd738c. * Revert "Log archive sizes." This reverts commit 3ebd804. * Revert "Fix typo." This reverts commit 47fda9a. * Revert "Add more useful display name to install preset." This reverts commit cdd367e. * Revert "Add overview doc for release workflows." This reverts commit a6af3bf. * Revert "Document workflow ref requirements/usage." This reverts commit 448da79. * Revert "Cleanup version handling in create-new." This reverts commit 75c2500. * Revert "Upload source packages, too." This reverts commit dde8e2e. * Revert "Verify the repo version in release-finalize." This reverts commit c05951a. * Revert "Simply release branch creation." This reverts commit 4c7eacb. * Revert "Refactor for consistency." This reverts commit ed46ca8. * Revert "Infer version in RC workflow from repo/git state." This reverts commit ea07539. * Revert "Add json file with version info that we can parse in CI." This reverts commit 0beede5. * Revert "Release workflow updates:" This reverts commit 2873882. * Revert "Add `install` preset." This reverts commit 14d95a4. * Revert "Add reusable workflow for updating version in branch with a PR" This reverts commit 70a2872. * Revert "Add release automation workflows." This reverts commit 43eb66a.

* Drop `_LIBCUDACXX_THREAD_ABI_VISIBILITY` its always defined as `_LIBCUDACXX_HIDE_FROM_ABI` * Drop `_LIBCUDACXX_NO_THREAD_SAFETY_ANALYSIS` Its never defined outside of `__FreeBSD__` * Drop `thread_if` * Drop `__libcpp_thread_favorite_barrier_index` * Drop `_LIBCUDACXX_HAS_NO_THREAD_CONTENTION_TABLE` It is always defined * Drop `_LIBCUDACXX_HAS_NO_PLATFORM_WAIT` It is always defined and only used once * Drop `_LIBCUDACXX_BUILDING_THREAD_LIBRARY_EXTERNAL` * Move macro definition out of function declaration * Move threading_support * Split into the different threading mechanisms * Disentangle `_LIBCUDACXX_HAS_THREAD_API_EXTERNAL` with other backends * Fix missing qualifiers and attributes * Silence a ICC warning about `__libcpp_thread_id_equal` * Drop more unused funtions from pthread * Move to `__thread` subfolder

The `cccl_generate_install_rules` requires two arguments. When `CCCL_TOPLEVEL_PROJECT` doesn't exist all the existing calls will have `HEADERS_INCLUDE` or `NO_HEADERS` become the second argument.

Strip prefix paths from cudax documentation

…ource (#2587)

bernhardmgruber and others added 30 commits July 30, 2024 09:54

Cleanup CUB block/thread load and exchange (#1946)

6dfc8dd

Replace _LIBCUDACXX_CPO_ACCESSIBILITY with _CCCL_GLOBAL_CONSTANT (…

d92ef23

…#1881)

Add script to update RAPIDS version. (#2082)

d4f928e

* Add script to update RAPIDS version. * Update to 24.10.

Update bad links (#2080)

ce95739

* fix broken links * revert repo.toml * linkchecker fixes * fix .cuh errors * lint

Fix line break issues that break doxygen code examples (#2103)

c0cfbd0

Add internal wrapper for cuda driver APIs (#2070)

7a3dae7

* Add a header to interact with driver APIs * Add a test for the driver API interaction * Format * Fix formatting

rename device to device_ref, add immovable device as a place to…

a2a3824

… cache properties (#2110)

Use the float flavors of the cmath functions in the extended floating…

bddcd20

… point fallbacks (#2106) Fixes #2078

Ensure that we avoid ABI Version conflics (#2137)

2600135

Ensure that cuda_memory_resource allocates memory on the proper dev…

39b926a

…ice (#2073) * Ensure that `cuda_memory_resource` allocates memory on the proper device * Move `__ensure_current_device` to own header

Implement a cudax::get_stream CPO (#2135)

fadb135

Fix missing copy of docs artifacts (#2162)

cc0b3d1

Also fix typo in the link

Update CODEOWNERS

cbe01b0

[CUDAX] Add a global constexpr cudax::devices range for all devices…

a8ca75c

… in the system (#2100) * add `cuda::devices` vector the number of cuda devices can be determined by calling `cuda::devices.size()`. `cuda::devices` is a range of `cuda::device` objects.

fix use of cudaStream_t as if it were a stream wrapper (#2190)

d0254e4

Fix uninitialized_buffer self assignment (#2170)

a903dc6

Fix trivial_copy_device_to_device execution space (#2164)

9459e4a

* Fix trivial_copy_device_to_device execution space * Typo * Format * Extra empty line

Clarify libcu++ use by non-CUDA compilers (#1969)

c65a965

Fixes: #1968

Warn when using C++14 in CUB and Thrust (#2166)

e519f25

Fixes: #2165

Fix the clang-format path in the devcotnainers (#2194)

fe27d99

In the devcontainers `clang-format` is now installed into `/usr/bin/clang-format`

Mount a build directory for CCCL projects if WSL is detected (#2035)

d1e7c1c

Co-authored-by: Michael Schellenberger Costa <[email protected]>

ericniebler and others added 29 commits October 16, 2024 18:43

Add install preset.

14d95a4

Release workflow updates:

2873882

- Generate tarballs - Create draft release - Get ver info from GITHUB_REF (must be RC tag) [skip-matrix][skip-rapids][skip-vdc][skip-docs]

Add json file with version info that we can parse in CI.

0beede5

Infer version in RC workflow from repo/git state.

ea07539

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

Refactor for consistency.

ed46ca8

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

Simply release branch creation.

4c7eacb

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

Verify the repo version in release-finalize.

c05951a

Upload source packages, too.

dde8e2e

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

Cleanup version handling in create-new.

75c2500

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

Document workflow ref requirements/usage.

448da79

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

Add overview doc for release workflows.

a6af3bf

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

Add more useful display name to install preset.

cdd367e

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

Fix typo.

47fda9a

Log archive sizes.

3ebd804

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

Use annotated tags.

5cd738c

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

Add slack notifications.

9551a53

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

Tweaks to slack notifs.

0d475ef

[skip-matrix][skip-rapids][skip-vdc][skip-docs]

Add reusable workflow for updating version in branch with a PR

70a2872

assert that cuda::std::declval is noexcept (#2588)

09a879c

add __is_callable_v variable template when possible (#2598)

084cd53

CCCL_TOPLEVEL_PROJECT always needs to be defined (#2597)

4ffa680

The `cccl_generate_install_rules` requires two arguments. When `CCCL_TOPLEVEL_PROJECT` doesn't exist all the existing calls will have `HEADERS_INCLUDE` or `NO_HEADERS` become the second argument.

Follow-up of #1954 for cudax documentation (#2603)

babe437

Strip prefix paths from cudax documentation

examples/cudax/CMakeLists.txt should not be executable (#2594)

e576578

[CUDAX] Peer access control on async_memory_pool and async_memory_res…

03b7994

…ource (#2587)

miscco closed this Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Main #25

Main #25

miscco commented Oct 22, 2024

Main #25

Main #25

Conversation

miscco commented Oct 22, 2024