forked from NVIDIA/cccl
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Main #25
Closed
Closed
Main #25
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
) * Improve binary function objects and replace thrust implementation * simplify use of ::cuda::std binary_function_objects * Replace _CCCL_CONSTEXPR_CXX14 with constexpr in all libcudacxx binary function objects that are imported in thrust. * Determine partial sum type without ::result_type * Ignore _LIBCUDACXX_DEPRECATED_IN_CXX11 for doxygen Co-authored-by: Bernhard Manfred Gruber <[email protected]>
* Add script to update RAPIDS version. * Update to 24.10.
* fix broken links * revert repo.toml * linkchecker fixes * fix .cuh errors * lint
* Add a header to interact with driver APIs * Add a test for the driver API interaction * Format * Fix formatting
* Use `common_type` for complex `pow` Previously we would rely on our internal `__promote` function. However, that could have surprising results, e.g. `pow(complexy<float>, int)` would return `complex<double>` With C++23, this situation got clarified and we should use `common_type` to determine the return type.
… cache properties (#2110)
* Drop `cuda::get_property` CPO It serves no purpose as it only ever forwards via ADL and also breaks older nvcc * Ensure that we test memory resources * Implement `cuda::uninitialized_buffer` `cuda::uninitialized_buffer` provides an allocation of `N` elements of type `T` utilitzing a `cuda::mr::resource` to allocate the storage. `cuda::uninitialized_buffer` takes care of alignment and deallocation of the storage. The user is required to ensure that the lifetime of the memory resource exceeds the lifetime of the buffer.
…ice (#2073) * Ensure that `cuda_memory_resource` allocates memory on the proper device * Move `__ensure_current_device` to own header
* Clarify compatibility wrt. template specializations We do not want users to specialize arbitrary templates in CCCL unless otherwise stated. This PR makes this clear in the README.md. Co-authored-by: Michael Schellenberger Costa <[email protected]>
* Make `cuda::std::tuple` trivially copyable This is similar to the situation with `cuda::std::pair` We have a lot of users that rely on types being trivially copyable, so that they can utilize memcpy and friends. Previously, `cuda::std::tuple` did not satisfy this because it needs to handle reference types. Given that we already specialize `__tuple_leaf` depending on whether the class is empty or not, we can simply add a third specialization that handles the trivially copyable types and one that synthesizes assignment. Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Also fix typo in the link
``` In function bool cuda::std::__4::__dispatch_memmove(_Up*, _Tp*, size_t) ... error: *(unsigned char*)(&privatized_decode_op[0]) may be used uninitialized [-Werror=maybe-uninitialized] ... *(unsigned char*)(&privatized_decode_op[0]) was declared here 1528 | PrivatizedDecodeOpT privatized_decode_op[NUM_ACTIVE_CHANNELS]{}; ```
* Fix flakey heterogeneous tests by ensuring only *one* writer exists in parallel between H/D * Fixup copy paste mistake * Make host atomics simpler by removing the ugly alignment type * Fix deadlocks introduced into barrier/semaphore tests * Revert removing hacky atomic wrapping stuff * Fix unused warning bug in GCC-6
``` Linking CXX executable bin/cub.cpp14.catch2_test.lid_0 FAILED: bin/cub.cpp14.catch2_test.lid_0 ... /usr/bin/ld: cub/test/CMakeFiles/cub.cpp14.test.warp_scan_api.dir/catch2_test_warp_scan_api.cu.o: in function `InclusiveScanKernel(int*)': /usr/local/cuda-12.7/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxInitDefs.h:473: multiple definition of `InclusiveScanKernel(int*)'; cub/test/CMakeFiles/cub.cpp14.test.block_scan_api.dir/catch2_test_block_scan_api.cu.o:/usr/local/cuda-12.7/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxInitDefs.h:468: first defined here collect2: error: ld returned 1 exit status ```
… in the system (#2100) * add `cuda::devices` vector the number of cuda devices can be determined by calling `cuda::devices.size()`. `cuda::devices` is a range of `cuda::device` objects.
* Fix trivial_copy_device_to_device execution space * Typo * Format * Extra empty line
In the devcontainers `clang-format` is now installed into `/usr/bin/clang-format`
Co-authored-by: Michael Schellenberger Costa <[email protected]>
… it in places where it was missing (#2192) * Change __scoped_device to use driver API * Switch to use driver API based dev setter * Remove constexpr from operator device() * Fix comments and includes * Fallback to non-versioned get entry point pre 12.5 We need to use versioned version to get correct cuStreamGetCtx. There is v2 version of it in 12.5, fortunatelly the versioned get entry point is available there too * Fix unused local variable * Fix warnings in ensure_current_device test * Move ensure current device out of detail * Add LIBCUDACXX_ENABLE_EXCEPTIONS to tests cmake
* add `_LIBCUDACXX_REQUIRES_EXPR` to the concepts emulation macros * work around nvcc pre-12.2 bug and molify nvrtc * silence warning about an always true condition * simplify macro substitution with the help of an alias template * fix the short-circuiting behavior of the `async_resource` concept pre-c++20 * replace C-style casts with C++-style `static_cast`s * add missing `_LIBCUDACXX_HIDE_FROM_ABI` function annotations * restore short-circuiting in the `resource` concept
Fixes #653. Adds three new workflows that can be manually triggered at various points in the release process: - `release-create-new`: Begin the release process for a new version. - Inputs: - `new_version`: The new version, eg. "2.3.1" - `branch_point`: Optional; If the `branch/{major}.{minor}.x` branch does not exist, create it from this SHA. If the release branch already exists, this is ignored. If not provided, the release branch is created from the current `main` branch. - Actions: - Creates release branch if needed. - Bumps version numbers with `update-version.sh` in topic branch. - Creates pull request to merge the topic branch into `main` - Marks the pull request for backporting to the release branch. - `release-update-rc`: Validate and tag a new release candidate. - Inputs: - `new_version`: The new version, eg. "2.3.1" - Actions: - Uses the HEAD SHA of the release branch for testing/tagging. - Determines the next rc for this version by inspecting existing tags. - Runs the `pull_request` workflow to validate the release candidate. This can be modified in the future to run a special rc acceptance workflow. - Tags the release candidate if the workflow passes. - `release-finalize`: Tag a final release. - Inputs: - `new_version`: The new version, eg. "2.3.1" - Actions: - Determines the most recent release candidate. - Tags the latest release candidate as the final release. [skip-matrix][skip-rapids][skip-vdc][skip-docs]
- Generate tarballs - Create draft release - Get ver info from GITHUB_REF (must be RC tag) [skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
* ensure cupy arrays can be used with cuda.parallel too * copy the same comment to all places wherever applicable Co-authored-by: Michael Schellenberger Costa <[email protected]> * ensure all needed imports are shown in the example * add CuPy (+CUDA 12.x) as a test dependency --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
* Revert "Tweaks to slack notifs." This reverts commit 0d475ef. * Revert "Add slack notifications." This reverts commit 9551a53. * Revert "Use annotated tags." This reverts commit 5cd738c. * Revert "Log archive sizes." This reverts commit 3ebd804. * Revert "Fix typo." This reverts commit 47fda9a. * Revert "Add more useful display name to install preset." This reverts commit cdd367e. * Revert "Add overview doc for release workflows." This reverts commit a6af3bf. * Revert "Document workflow ref requirements/usage." This reverts commit 448da79. * Revert "Cleanup version handling in create-new." This reverts commit 75c2500. * Revert "Upload source packages, too." This reverts commit dde8e2e. * Revert "Verify the repo version in release-finalize." This reverts commit c05951a. * Revert "Simply release branch creation." This reverts commit 4c7eacb. * Revert "Refactor for consistency." This reverts commit ed46ca8. * Revert "Infer version in RC workflow from repo/git state." This reverts commit ea07539. * Revert "Add json file with version info that we can parse in CI." This reverts commit 0beede5. * Revert "Release workflow updates:" This reverts commit 2873882. * Revert "Add `install` preset." This reverts commit 14d95a4. * Revert "Add reusable workflow for updating version in branch with a PR" This reverts commit 70a2872. * Revert "Add release automation workflows." This reverts commit 43eb66a.
* Drop `_LIBCUDACXX_THREAD_ABI_VISIBILITY` its always defined as `_LIBCUDACXX_HIDE_FROM_ABI` * Drop `_LIBCUDACXX_NO_THREAD_SAFETY_ANALYSIS` Its never defined outside of `__FreeBSD__` * Drop `thread_if` * Drop `__libcpp_thread_favorite_barrier_index` * Drop `_LIBCUDACXX_HAS_NO_THREAD_CONTENTION_TABLE` It is always defined * Drop `_LIBCUDACXX_HAS_NO_PLATFORM_WAIT` It is always defined and only used once * Drop `_LIBCUDACXX_BUILDING_THREAD_LIBRARY_EXTERNAL` * Move macro definition out of function declaration * Move threading_support * Split into the different threading mechanisms * Disentangle `_LIBCUDACXX_HAS_THREAD_API_EXTERNAL` with other backends * Fix missing qualifiers and attributes * Silence a ICC warning about `__libcpp_thread_id_equal` * Drop more unused funtions from pthread * Move to `__thread` subfolder
The `cccl_generate_install_rules` requires two arguments. When `CCCL_TOPLEVEL_PROJECT` doesn't exist all the existing calls will have `HEADERS_INCLUDE` or `NO_HEADERS` become the second argument.
Strip prefix paths from cudax documentation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Its a pain to retrieve the address of a device symbol.
This adds a little helper that makes it less painfull