[Backport 2.7]: PRs #3201, #3523, #3547, #3580 (#3536) (#3600)

* Backport PRs #3201, #3523, #3547, #3580 to the 2.8.x branch. (#3536) * [FEA]: Introduce Python module with CCCL headers (#3201) * Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative * Run `copy_cccl_headers_to_aude_include()` before `setup()` * Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path. * Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel * Bug fix: cuda/_include only exists after shutil.copytree() ran. * Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py * Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions) * Replace := operator (needs Python 3.8+) * Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md * Restore original README.md: `pip3 install -e` now works on first pass. * cuda_cccl/README.md: FOR INTERNAL USE ONLY * Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under #3201 (comment)) Command used: ci/update_version.sh 2 8 0 * Modernize pyproject.toml, setup.py Trigger for this change: * #3201 (comment) * #3201 (comment) * Install CCCL headers under cuda.cccl.include Trigger for this change: * #3201 (comment) Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely. * Factor out cuda_cccl/cuda/cccl/include_paths.py * Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative * Add missing Copyright notice. * Add missing __init__.py (cuda.cccl) * Add `"cuda.cccl"` to `autodoc.mock_imports` * Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.) * Add # TODO: move this to a module-level import * Modernize cuda_cooperative/pyproject.toml, setup.py * Convert cuda_cooperative to use hatchling as build backend. * Revert "Convert cuda_cooperative to use hatchling as build backend." This reverts commit 61637d6. * Move numpy from [build-system] requires -> [project] dependencies * Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH * Remove copy_license() and use license_files=["../../LICENSE"] instead. * Further modernize cuda_cccl/setup.py to use pathlib * Trivial simplifications in cuda_cccl/pyproject.toml * Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code * Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml * Add taplo-pre-commit to .pre-commit-config.yaml * taplo-pre-commit auto-fixes * Use pathlib in cuda_cooperative/setup.py * CCCL_PYTHON_PATH in cuda_cooperative/setup.py * Modernize cuda_parallel/pyproject.toml, setup.py * Use pathlib in cuda_parallel/setup.py * Add `# TOML lint & format` comment. * Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml * Use pathlib in cuda/cccl/include_paths.py * pre-commit autoupdate (EXCEPT clang-format, which was manually restored) * Fixes after git merge main * Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result' ``` =========================================================================== warnings summary =========================================================================== tests/test_reduce.py::test_reduce_non_contiguous /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080> Traceback (most recent call last): File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__ bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result)) ^^^^^^^^^^^^^^^^^ AttributeError: '_Reduce' object has no attribute 'build_result' warnings.warn(pytest.PytestUnraisableExceptionWarning(msg)) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ============================================================== ``` * Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy` * Introduce cuda_cooperative/constraints.txt * Also add cuda_parallel/constraints.txt * Add `--constraint constraints.txt` in ci/test_python.sh * Update Copyright dates * Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024) For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI. * Remove unused cuda_parallel jinja2 dependency (noticed by chance). * Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead. * Make cuda_cooperative, cuda_parallel testing completely independent. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Fix sign-compare warning (#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]" This reverts commit ea33a21. Error message: #3201 (comment) * Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Restore original ci/matrix.yaml [skip-rapids] * Use for loop in test_python.sh to avoid code duplication. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci] * Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]" This reverts commit ec206fd. * Implement suggestion by @shwina (#3201 (review)) * Address feedback by @leofang --------- Co-authored-by: Bernhard Manfred Gruber <[email protected]> * cuda.parallel: invoke pytest directly rather than via `python -m pytest` (#3523) Co-authored-by: Ashwin Srinath <[email protected]> * Copy file from PR #3547 (bugfix/drop_pipe_in_lit by @wmaxey) * Revert "cuda.parallel: invoke pytest directly rather than via `python -m pytest` (#3523)" This reverts commit a2e21cb. * Replace pipes.quote with shlex.quote in lit config (#3547) * Replace pipes.quote with shlex.quote * Drop TBB run on windows to unblock CI * Update ci/matrix.yaml Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> * Remove nvks runners from testing pool. (#3580) --------- Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Ashwin Srinath <[email protected]> Co-authored-by: Ashwin Srinath <[email protected]> Co-authored-by: Wesley Maxey <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: Allison Piper <[email protected]> * Suppress execution checks for vocabulary types (#3578) * Suppress execution checks for optional * Suppress execution checks for `expected` * Suppress execution checks for `pair` * Suppress execution checks for `variant` * Remove some jobs * Disable sampls in old CI * Fix compiler detection * Disable tests for unsupported standard modes * Fix compiler detection * Fix compiler detection more * Fix matrix * Also suppress for swap * Fix formatting * Use the internal function fopr MSVC * Try adding import? * Revert all changes to python module * Fix formatting * Update `upload-pages-artifact` * Update RAPIDS to 25.02. (#2967) * Update RAPIDS to 25.02. * Remove RAFT BUILD_ANN_BENCH option. * Rename KvikIO to kvikio. * Add back cugraph-ops until it's completely purged from RAPIDS upstream dependencies. * Update devcontainers. * Use the 24.10 image for cccl CI * Drop gugraph ops * Also drop cugraph-gnn for now --------- Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Ashwin Srinath <[email protected]> Co-authored-by: Ashwin Srinath <[email protected]> Co-authored-by: Wesley Maxey <[email protected]> Co-authored-by: Allison Piper <[email protected]> Co-authored-by: Bradley Dice <[email protected]>
NVIDIA · Feb 5, 2025 · 3976fa9 · 3976fa9
1 parent cbc6b9b
commit 3976fa9
Show file tree

Hide file tree

Showing 40 changed files with 1,807 additions and 67 deletions.
diff --git a/.github/actions/docs-build/action.yml b/.github/actions/docs-build/action.yml
@@ -54,4 +54,4 @@ runs:
     # Upload docs as pages artifacts
     - name: Upload artifact
       if: ${{ inputs.upload_pages_artifact == 'true' }}
-      uses: actions/upload-pages-artifact@v2
+      uses: actions/upload-pages-artifact@v3
diff --git a/.github/workflows/build-rapids.yml b/.github/workflows/build-rapids.yml
@@ -36,10 +36,9 @@ jobs:
       fail-fast: false
       matrix:
         include:
-          - { cuda: '12.5', libs: 'rmm KvikIO cudf cudf_kafka cuspatial',        }
-          - { cuda: '12.5', libs: 'rmm ucxx raft cuvs',                          }
-          - { cuda: '12.5', libs: 'rmm ucxx raft cumlprims_mg cuml',             }
-          - { cuda: '12.5', libs: 'rmm ucxx raft cugraph-ops wholegraph cugraph' }
+          - { cuda: '12.5', libs: 'rmm kvikio cudf cudf_kafka cuspatial'         }
+          - { cuda: '12.5', libs: 'rmm ucxx raft cuvs cumlprims_mg cuml'         }
+          - { cuda: '12.5', libs: 'rmm ucxx raft cugraph'}
     permissions:
       id-token: write
       contents: read
@@ -61,20 +60,20 @@ jobs:
           CI: true
           RAPIDS_LIBS: ${{ matrix.libs }}
           # Uncomment any of these to customize the git repo and branch for a RAPIDS lib:
-          # RAPIDS_cmake_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
-          # RAPIDS_cudf_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
-          # RAPIDS_cudf_kafka_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
-          # RAPIDS_cugraph_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
-          # RAPIDS_cugraph_ops_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
-          # RAPIDS_cuml_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
-          # RAPIDS_cumlprims_mg_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
-          # RAPIDS_cuspatial_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
-          # RAPIDS_cuvs_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
-          # RAPIDS_KvikIO_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
-          # RAPIDS_raft_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
-          # RAPIDS_rmm_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
-          # RAPIDS_ucxx_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-0.40"}'
-          # RAPIDS_wholegraph_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-24.10"}'
+          # RAPIDS_cmake_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_cudf_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_cudf_kafka_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_cugraph_ops_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_cugraph_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_cugraph_gnn_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_cuml_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_cumlprims_mg_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_cuspatial_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_cuvs_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_kvikio_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_raft_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_rmm_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-25.02"}'
+          # RAPIDS_ucxx_GIT_REPO: '{"upstream": "rapidsai", "tag": "branch-0.42"}'
         run: |
           cat <<"EOF" > "$RUNNER_TEMP/ci-entrypoint.sh"
           #! /usr/bin/env bash

diff --git a/ci/matrix.yaml b/ci/matrix.yaml
@@ -14,10 +14,13 @@ workflows:
     - {jobs: ['build'], std: 'all', ctk: '11.1', cxx: ['gcc6', 'gcc7', 'gcc8', 'gcc9', 'clang9', 'msvc2017']}
     - {jobs: ['build'], std: 'all', ctk: '11.8', cxx: ['gcc11'], sm: '60;70;80;90'}
     # Current CTK
-    - {jobs: ['build'], std: 'all', cxx: ['gcc7', 'gcc8', 'gcc9', 'gcc10', 'gcc11', 'gcc12']}
-    - {jobs: ['build'], std: 'all', cxx: ['clang9', 'clang10', 'clang11', 'clang12', 'clang13', 'clang14', 'clang15', 'clang16']}
-    - {jobs: ['build'], std: 'all', cxx: ['intel', 'msvc2019']}
-    - {jobs: ['test'],  std: 'all', cxx: ['gcc13', 'clang17', 'msvc2022']}
+    - {jobs: ['build'], std: [11, 17], cxx: ['gcc7', 'gcc8', 'gcc9']}
+    - {jobs: ['build'], std: [14, 20], cxx: ['gcc10', 'gcc11', 'gcc12']}
+    - {jobs: ['build'], std: [11, 14], cxx: ['clang10', 'clang11']}
+    - {jobs: ['build'], std: [17],  cxx: ['clang12', 'clang13', 'clang14', 'clang15']}
+    - {jobs: ['build'], std: 'all', cxx: ['clang9', 'clang16']}
+    - {jobs: ['build'], std: 17, cxx: ['intel', 'msvc2019']}
+    - {jobs: ['test'],  std: 'all', cxx: ['gcc', 'clang', 'msvc2022']}
     # Modded builds:
     - {jobs: ['build'], std: 'all', cxx: ['gcc', 'clang'], cpu: 'arm64'}
     - {jobs: ['build'], std: 'all', cxx: ['gcc'], sm: '90a'}
@@ -219,13 +222,13 @@ projects:
 
 # testing -> Runner with GPU is in a nv-gh-runners testing pool
 gpus:
-  v100:     { sm: 70 }                # 32 GB,  40 runners
-  t4:       { sm: 75, testing: true } # 16 GB,   8 runners
-  rtx2080:  { sm: 75, testing: true } #  8 GB,   8 runners
-  rtxa6000: { sm: 86, testing: true } # 48 GB,  12 runners
-  l4:       { sm: 89, testing: true } # 24 GB,  48 runners
-  rtx4090:  { sm: 89, testing: true } # 24 GB,  10 runners
-  h100:     { sm: 90 }                # 80 GB,  16 runners
+  v100:     { sm: 70 } # 32 GB,  40 runners
+  t4:       { sm: 75 } # 16 GB,  10 runners
+  rtx2080:  { sm: 75 } #  8 GB,  12 runners
+  rtxa6000: { sm: 86 } # 48 GB,  12 runners
+  l4:       { sm: 89 } # 24 GB,  48 runners
+  rtx4090:  { sm: 89 } # 24 GB,  10 runners
+  h100:     { sm: 90 } # 80 GB,  16 runners
 
 # Tags are used to define a `matrix job` in the workflow section.
 #

diff --git a/ci/rapids/cuda12.5-conda/devcontainer.json b/ci/rapids/cuda12.5-conda/devcontainer.json
@@ -1,13 +1,15 @@
 {
-  "image": "rapidsai/devcontainers:24.10-cpp-mambaforge-ubuntu22.04",
+  "image": "rapidsai/devcontainers:25.02-cpp-mambaforge-ubuntu22.04",
   "runArgs": [
     "--rm",
     "--name",
-    "${localEnv:USER:anon}-${localWorkspaceFolderBasename}-rapids-24.10-cuda12.5-conda"
+    "${localEnv:USER:anon}-${localWorkspaceFolderBasename}-rapids-25.02-cuda12.5-conda"
   ],
-  "hostRequirements": {"gpu": "optional"},
+  "hostRequirements": {
+    "gpu": "optional"
+  },
   "features": {
-    "ghcr.io/rapidsai/devcontainers/features/rapids-build-utils:24.10": {}
+    "ghcr.io/rapidsai/devcontainers/features/rapids-build-utils:25.2": {}
   },
   "overrideFeatureInstallOrder": [
     "ghcr.io/rapidsai/devcontainers/features/rapids-build-utils"
@@ -37,13 +39,25 @@
     "RAPIDS_cumlprims_mg_GIT_REPO": "${localEnv:RAPIDS_cumlprims_mg_GIT_REPO}",
     "RAPIDS_cuml_GIT_REPO": "${localEnv:RAPIDS_cuml_GIT_REPO}",
     "RAPIDS_cugraph_ops_GIT_REPO": "${localEnv:RAPIDS_cugraph_ops_GIT_REPO}",
-    "RAPIDS_wholegraph_GIT_REPO": "${localEnv:RAPIDS_wholegraph_GIT_REPO}",
     "RAPIDS_cugraph_GIT_REPO": "${localEnv:RAPIDS_cugraph_GIT_REPO}",
+    "RAPIDS_cugraph_gnn_GIT_REPO": "${localEnv:RAPIDS_cugraph_gnn_GIT_REPO}",
     "RAPIDS_cuspatial_GIT_REPO": "${localEnv:RAPIDS_cuspatial_GIT_REPO}"
   },
-  "initializeCommand": ["/bin/bash", "-c", "mkdir -m 0755 -p ${localWorkspaceFolder}/.{aws,cache,config} ${localWorkspaceFolder}/ci/rapids/.{conda,log/devcontainer-utils} ${localWorkspaceFolder}/ci/rapids/.repos/{rmm,kvikio,ucxx,cudf,raft,cuvs,cuml,wholegraph,cugraph,cuspatial}"],
-  "postCreateCommand": ["/bin/bash", "-c", "if [ ${CI:-false} = 'false' ]; then . /home/coder/cccl/ci/rapids/post-create-command.sh; fi"],
-  "postAttachCommand": ["/bin/bash", "-c", "if [ ${CODESPACES:-false} = 'true' ]; then . devcontainer-utils-post-attach-command; fi"],
+  "initializeCommand": [
+    "/bin/bash",
+    "-c",
+    "mkdir -m 0755 -p ${localWorkspaceFolder}/.{aws,cache,config} ${localWorkspaceFolder}/ci/rapids/.{conda,log/devcontainer-utils} ${localWorkspaceFolder}/ci/rapids/.repos/{rmm,kvikio,ucxx,cudf,raft,cuvs,cuml,cugraph,cugraph-gnn,cuspatial}"
+  ],
+  "postCreateCommand": [
+    "/bin/bash",
+    "-c",
+    "if [ ${CI:-false} = 'false' ]; then . /home/coder/cccl/ci/rapids/post-create-command.sh; fi"
+  ],
+  "postAttachCommand": [
+    "/bin/bash",
+    "-c",
+    "if [ ${CODESPACES:-false} = 'true' ]; then . devcontainer-utils-post-attach-command; fi"
+  ],
   "workspaceFolder": "/home/coder/${localWorkspaceFolderBasename}",
   "workspaceMount": "source=${localWorkspaceFolder},target=/home/coder/${localWorkspaceFolderBasename},type=bind,consistency=consistent",
   "mounts": [
@@ -57,8 +71,8 @@
     "source=${localWorkspaceFolder}/ci/rapids/.repos/raft,target=/home/coder/raft,type=bind,consistency=consistent",
     "source=${localWorkspaceFolder}/ci/rapids/.repos/cuvs,target=/home/coder/cuvs,type=bind,consistency=consistent",
     "source=${localWorkspaceFolder}/ci/rapids/.repos/cuml,target=/home/coder/cuml,type=bind,consistency=consistent",
-    "source=${localWorkspaceFolder}/ci/rapids/.repos/wholegraph,target=/home/coder/wholegraph,type=bind,consistency=consistent",
     "source=${localWorkspaceFolder}/ci/rapids/.repos/cugraph,target=/home/coder/cugraph,type=bind,consistency=consistent",
+    "source=${localWorkspaceFolder}/ci/rapids/.repos/cugraph-gnn,target=/home/coder/cugraph-gnn,type=bind,consistency=consistent",
     "source=${localWorkspaceFolder}/ci/rapids/.repos/cuspatial,target=/home/coder/cuspatial,type=bind,consistency=consistent",
     "source=${localWorkspaceFolder}/ci/rapids/.conda,target=/home/coder/.conda,type=bind,consistency=consistent",
     "source=${localWorkspaceFolder}/ci/rapids/.log/devcontainer-utils,target=/var/log/devcontainer-utils,type=bind,consistency=consistent"

diff --git a/ci/rapids/post-create-command.sh b/ci/rapids/post-create-command.sh
@@ -69,7 +69,7 @@ _create_rapids_cmake_override_json() {
   | tee ~/rapids-cmake-override-versions.json;
 
     # Define default CMake args for each repo
-    local -a cmake_args=(BUILD_TESTS BUILD_BENCHMARKS BUILD_ANN_BENCH BUILD_PRIMS_BENCH BUILD_CUGRAPH_MG_TESTS);
+    local -a cmake_args=(BUILD_TESTS BUILD_BENCHMARKS BUILD_PRIMS_BENCH BUILD_CUGRAPH_MG_TESTS);
     # Enable tests
     cmake_args=("${cmake_args[@]/#/"-D"}");
     cmake_args=("${cmake_args[@]/%/"=${RAPIDS_ENABLE_TESTS:-ON}"}");

diff --git a/ci/update_version.sh b/ci/update_version.sh
@@ -103,6 +103,7 @@ update_file "$CUDAX_CMAKE_VERSION_FILE" "set(cudax_VERSION_MAJOR \([0-9]\+\))" "
 update_file "$CUDAX_CMAKE_VERSION_FILE" "set(cudax_VERSION_MINOR \([0-9]\+\))" "set(cudax_VERSION_MINOR $minor)"
 update_file "$CUDAX_CMAKE_VERSION_FILE" "set(cudax_VERSION_PATCH \([0-9]\+\))" "set(cudax_VERSION_PATCH $patch)"
 
+update_file "$CUDA_CCCL_VERSION_FILE" "^__version__ = \"\([0-9.]\+\)\"" "__version__ = \"$major.$minor.$patch\""
 update_file "$CUDA_COOPERATIVE_VERSION_FILE" "^__version__ = \"\([0-9.]\+\)\"" "__version__ = \"$pymajor.$pyminor.$major.$minor.$patch\""
 update_file "$CUDA_PARALLEL_VERSION_FILE" "^__version__ = \"\([0-9.]\+\)\"" "__version__ = \"$pymajor.$pyminor.$major.$minor.$patch\""
 

diff --git a/cudax/CMakeLists.txt b/cudax/CMakeLists.txt
@@ -25,7 +25,7 @@ endif()
 
 option(cudax_ENABLE_HEADER_TESTING "Test that CUDA Experimental's public headers compile." ON)
 option(cudax_ENABLE_TESTING "Build CUDA Experimental's tests." ON)
-option(cudax_ENABLE_SAMPLES "Build CUDA Experimental's samples." ON)
+option(cudax_ENABLE_SAMPLES "Build CUDA Experimental's samples." OFF)
 
 include(cmake/cudaxBuildCompilerTargets.cmake)
 include(cmake/cudaxBuildTargetList.cmake)

diff --git a/libcudacxx/include/cuda/std/__expected/bad_expected_access.h b/libcudacxx/include/cuda/std/__expected/bad_expected_access.h
@@ -51,14 +51,6 @@ class bad_expected_access;
 template <>
 class bad_expected_access<void> : public ::std::exception
 {
-protected:
-  _CCCL_HIDE_FROM_ABI bad_expected_access() noexcept                             = default;
-  _CCCL_HIDE_FROM_ABI bad_expected_access(const bad_expected_access&)            = default;
-  _CCCL_HIDE_FROM_ABI bad_expected_access(bad_expected_access&&)                 = default;
-  _CCCL_HIDE_FROM_ABI bad_expected_access& operator=(const bad_expected_access&) = default;
-  _CCCL_HIDE_FROM_ABI bad_expected_access& operator=(bad_expected_access&&)      = default;
-  ~bad_expected_access() noexcept override                                       = default;
-
 public:
   // The way this has been designed (by using a class template below) means that we'll already
   // have a profusion of these vtables in TUs, and the dynamic linker will already have a bunch
@@ -74,10 +66,21 @@ template <class _Err>
 class bad_expected_access : public bad_expected_access<void>
 {
 public:
-  explicit bad_expected_access(_Err __e)
+#      if defined(_CCCL_CUDA_COMPILER_CLANG) // Clang needs this or it breaks with device only types
+  _CCCL_HOST_DEVICE
+#      endif // _CCCL_CUDA_COMPILER_CLANG
+  _CCCL_HIDE_FROM_ABI explicit bad_expected_access(_Err __e)
       : __unex_(_CUDA_VSTD::move(__e))
   {}
 
+#      if defined(_CCCL_CUDA_COMPILER_CLANG) // Clang needs this or it breaks with device only types
+  _CCCL_HOST_DEVICE
+#      endif // _CCCL_CUDA_COMPILER_CLANG
+  _CCCL_HIDE_FROM_ABI ~bad_expected_access() noexcept
+  {
+    __unex_.~_Err();
+  }
+
   _LIBCUDACXX_HIDE_FROM_ABI _Err& error() & noexcept
   {
     return __unex_;

diff --git a/libcudacxx/include/cuda/std/__expected/expected.h b/libcudacxx/include/cuda/std/__expected/expected.h
@@ -1077,6 +1077,7 @@ class expected : private __expected_move_assign<_Tp, _Err>
   }
 
   // [expected.object.eq], equality operators
+  _CCCL_EXEC_CHECK_DISABLE
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator==(const expected& __x, const expected& __y)
   {
     if (__x.__has_val_ != __y.has_value())
@@ -1097,12 +1098,14 @@ class expected : private __expected_move_assign<_Tp, _Err>
   }
 
 #  if _CCCL_STD_VER < 2020
+  _CCCL_EXEC_CHECK_DISABLE
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator!=(const expected& __x, const expected& __y)
   {
     return !(__x == __y);
   }
 #  endif // _CCCL_STD_VER < 2020
 
+  _CCCL_EXEC_CHECK_DISABLE
   _LIBCUDACXX_TEMPLATE(class _T2, class _E2)
   _LIBCUDACXX_REQUIRES((!_CCCL_TRAIT(is_void, _T2)))
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator==(const expected& __x, const expected<_T2, _E2>& __y)
@@ -1125,6 +1128,7 @@ class expected : private __expected_move_assign<_Tp, _Err>
   }
 
 #  if _CCCL_STD_VER < 2020
+  _CCCL_EXEC_CHECK_DISABLE
   _LIBCUDACXX_TEMPLATE(class _T2, class _E2)
   _LIBCUDACXX_REQUIRES((!_CCCL_TRAIT(is_void, _T2)))
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator!=(const expected& __x, const expected<_T2, _E2>& __y)
@@ -1133,25 +1137,29 @@ class expected : private __expected_move_assign<_Tp, _Err>
   }
 #  endif // _CCCL_STD_VER < 2020
 
+  _CCCL_EXEC_CHECK_DISABLE
   _LIBCUDACXX_TEMPLATE(class _T2)
   _LIBCUDACXX_REQUIRES((!__expected::__is_expected_nonvoid<_T2>) )
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator==(const expected& __x, const _T2& __v)
   {
     return __x.__has_val_ && static_cast<bool>(__x.__union_.__val_ == __v);
   }
 #  if _CCCL_STD_VER < 2020
+  _CCCL_EXEC_CHECK_DISABLE
   _LIBCUDACXX_TEMPLATE(class _T2)
   _LIBCUDACXX_REQUIRES((!__expected::__is_expected_nonvoid<_T2>) )
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator==(const _T2& __v, const expected& __x)
   {
     return __x.__has_val_ && static_cast<bool>(__x.__union_.__val_ == __v);
   }
+  _CCCL_EXEC_CHECK_DISABLE
   _LIBCUDACXX_TEMPLATE(class _T2)
   _LIBCUDACXX_REQUIRES((!__expected::__is_expected_nonvoid<_T2>) )
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator!=(const expected& __x, const _T2& __v)
   {
     return !__x.__has_val_ || static_cast<bool>(__x.__union_.__val_ != __v);
   }
+  _CCCL_EXEC_CHECK_DISABLE
   _LIBCUDACXX_TEMPLATE(class _T2)
   _LIBCUDACXX_REQUIRES((!__expected::__is_expected_nonvoid<_T2>) )
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator!=(const _T2& __v, const expected& __x)
@@ -1160,22 +1168,26 @@ class expected : private __expected_move_assign<_Tp, _Err>
   }
 #  endif // _CCCL_STD_VER < 2020
 
+  _CCCL_EXEC_CHECK_DISABLE
   template <class _E2>
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator==(const expected& __x, const unexpected<_E2>& __e)
   {
     return !__x.__has_val_ && static_cast<bool>(__x.__union_.__unex_ == __e.error());
   }
 #  if _CCCL_STD_VER < 2020
+  _CCCL_EXEC_CHECK_DISABLE
   template <class _E2>
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator==(const unexpected<_E2>& __e, const expected& __x)
   {
     return !__x.__has_val_ && static_cast<bool>(__x.__union_.__unex_ == __e.error());
   }
+  _CCCL_EXEC_CHECK_DISABLE
   template <class _E2>
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator!=(const expected& __x, const unexpected<_E2>& __e)
   {
     return __x.__has_val_ || static_cast<bool>(__x.__union_.__unex_ != __e.error());
   }
+  _CCCL_EXEC_CHECK_DISABLE
   template <class _E2>
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator!=(const unexpected<_E2>& __e, const expected& __x)
   {
@@ -1916,6 +1928,7 @@ class expected<void, _Err> : private __expected_move_assign<void, _Err>
   }
 
   // [expected.void.eq], equality operators
+  _CCCL_EXEC_CHECK_DISABLE
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator==(const expected& __x, const expected& __y) noexcept
   {
     if (__x.__has_val_ != __y.has_value())
@@ -1928,12 +1941,14 @@ class expected<void, _Err> : private __expected_move_assign<void, _Err>
     }
   }
 #  if _CCCL_STD_VER < 2020
+  _CCCL_EXEC_CHECK_DISABLE
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator!=(const expected& __x, const expected& __y) noexcept
   {
     return !(__x == __y);
   }
 #  endif
 
+  _CCCL_EXEC_CHECK_DISABLE
   template <class _E2>
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool
   operator==(const expected& __x, const expected<void, _E2>& __y) noexcept
@@ -1948,6 +1963,7 @@ class expected<void, _Err> : private __expected_move_assign<void, _Err>
     }
   }
 #  if _CCCL_STD_VER < 2020
+  _CCCL_EXEC_CHECK_DISABLE
   template <class _E2>
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool
   operator!=(const expected& __x, const expected<void, _E2>& __y) noexcept
@@ -1956,22 +1972,26 @@ class expected<void, _Err> : private __expected_move_assign<void, _Err>
   }
 #  endif
 
+  _CCCL_EXEC_CHECK_DISABLE
   template <class _E2>
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator==(const expected& __x, const unexpected<_E2>& __y) noexcept
   {
     return !__x.__has_val_ && static_cast<bool>(__x.__union_.__unex_ == __y.error());
   }
 #  if _CCCL_STD_VER < 2020
+  _CCCL_EXEC_CHECK_DISABLE
   template <class _E2>
   friend _LIBCUDACXX_HIDE_FROM_ABI constexpr bool operator==(const unexpected<_E2>& __y, const expected& __x) noexcept
   {
     return !__x.__has_val_ && static_cast<bool>(__x.__union_.__unex_ == __y.error());
   }
+  _CCCL_EXEC_CHECK_DISABLE
   template <class _E2>
   _LIBCUDACXX_HIDE_FROM_ABI friend constexpr bool operator!=(const expected& __x, const unexpected<_E2>& __y) noexcept
   {
     return __x.__has_val_ || static_cast<bool>(__x.__union_.__unex_ != __y.error());
   }
+  _CCCL_EXEC_CHECK_DISABLE
   template <class _E2>
   _LIBCUDACXX_HIDE_FROM_ABI friend constexpr bool operator!=(const unexpected<_E2>& __y, const expected& __x) noexcept
   {