Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main #25

Closed
wants to merge 384 commits into from
Closed

Main #25

wants to merge 384 commits into from

Conversation

miscco
Copy link
Owner

@miscco miscco commented Oct 22, 2024

Its a pain to retrieve the address of a device symbol.

This adds a little helper that makes it less painfull

bernhardmgruber and others added 30 commits July 30, 2024 09:54
)

* Improve binary function objects and replace thrust implementation
* simplify use of ::cuda::std binary_function_objects
* Replace _CCCL_CONSTEXPR_CXX14 with constexpr in all libcudacxx
binary function objects that are imported in thrust.
* Determine partial sum type without ::result_type
* Ignore _LIBCUDACXX_DEPRECATED_IN_CXX11 for doxygen

Co-authored-by: Bernhard Manfred Gruber <[email protected]>
* Add script to update RAPIDS version.

* Update to 24.10.
* fix broken links
* revert repo.toml
* linkchecker fixes
* fix .cuh errors
* lint
* Add a header to interact with driver APIs

* Add a test for the driver API interaction

* Format

* Fix formatting
* Use `common_type` for complex `pow`

Previously we would rely on our internal `__promote` function.

However, that could have surprising results, e.g. `pow(complexy<float>, int)` would return `complex<double>`

With C++23, this situation got clarified and we should use `common_type` to determine the return type.
* Drop `cuda::get_property` CPO

It serves no purpose as it only ever forwards via ADL and also breaks older nvcc

* Ensure that we test memory resources

* Implement `cuda::uninitialized_buffer`

`cuda::uninitialized_buffer` provides an allocation of `N` elements of type `T` utilitzing a `cuda::mr::resource` to allocate the storage.

`cuda::uninitialized_buffer` takes care of alignment and deallocation of the storage. The user is required to ensure that the lifetime of the memory resource exceeds the lifetime of the buffer.
…ice (#2073)

* Ensure that `cuda_memory_resource` allocates memory on the proper device

* Move `__ensure_current_device` to own header
* Clarify compatibility wrt. template specializations

We do not want users to specialize arbitrary templates in CCCL unless otherwise stated. This PR makes this clear in the README.md.

Co-authored-by: Michael Schellenberger Costa <[email protected]>
* Make `cuda::std::tuple` trivially copyable

This is similar to the situation with `cuda::std::pair`
We have a lot of users that rely on types being trivially copyable, so that they can utilize memcpy and friends.

Previously, `cuda::std::tuple` did not satisfy this because it needs to handle reference types.

Given that we already specialize `__tuple_leaf` depending on whether the class is empty or not, we can simply
add a third specialization that handles the trivially copyable types and one that synthesizes assignment.

Co-authored-by: Bernhard Manfred Gruber <[email protected]>
```
In function bool cuda::std::__4::__dispatch_memmove(_Up*, _Tp*, size_t)
...
error: *(unsigned char*)(&privatized_decode_op[0]) may be used uninitialized [-Werror=maybe-uninitialized]
...
*(unsigned char*)(&privatized_decode_op[0]) was declared here
 1528 |       PrivatizedDecodeOpT privatized_decode_op[NUM_ACTIVE_CHANNELS]{};
```
* Fix flakey heterogeneous tests by ensuring only *one* writer exists in parallel between H/D

* Fixup copy paste mistake

* Make host atomics simpler by removing the ugly alignment type

* Fix deadlocks introduced into barrier/semaphore tests

* Revert removing hacky atomic wrapping stuff

* Fix unused warning bug in GCC-6
```
Linking CXX executable bin/cub.cpp14.catch2_test.lid_0
FAILED: bin/cub.cpp14.catch2_test.lid_0
...
/usr/bin/ld: cub/test/CMakeFiles/cub.cpp14.test.warp_scan_api.dir/catch2_test_warp_scan_api.cu.o: in function `InclusiveScanKernel(int*)':
/usr/local/cuda-12.7/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxInitDefs.h:473: multiple definition of `InclusiveScanKernel(int*)'; cub/test/CMakeFiles/cub.cpp14.test.block_scan_api.dir/catch2_test_block_scan_api.cu.o:/usr/local/cuda-12.7/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxInitDefs.h:468: first defined here
collect2: error: ld returned 1 exit status

```
… in the system (#2100)

* add `cuda::devices` vector

the number of cuda devices can be determined by calling
`cuda::devices.size()`. `cuda::devices` is a range of
`cuda::device` objects.
* Fix trivial_copy_device_to_device execution space

* Typo

* Format

* Extra empty line
In the devcontainers `clang-format` is now installed into `/usr/bin/clang-format`
… it in places where it was missing (#2192)

* Change __scoped_device to use driver API

* Switch to use driver API based dev setter

* Remove constexpr from operator device()

* Fix comments and includes

* Fallback to non-versioned get entry point pre 12.5
We need to use versioned version to get correct cuStreamGetCtx.
There is v2 version of it in 12.5, fortunatelly the versioned
get entry point is available there too

* Fix unused local variable

* Fix warnings in ensure_current_device test

* Move ensure current device out of detail

* Add LIBCUDACXX_ENABLE_EXCEPTIONS to tests cmake
ericniebler and others added 29 commits October 16, 2024 18:43
* add `_LIBCUDACXX_REQUIRES_EXPR` to the concepts emulation macros

* work around nvcc pre-12.2 bug and molify nvrtc

* silence warning about an always true condition

* simplify macro substitution with the help of an alias template

* fix the short-circuiting behavior of the `async_resource` concept pre-c++20

* replace C-style casts with C++-style `static_cast`s

* add missing `_LIBCUDACXX_HIDE_FROM_ABI` function annotations

* restore short-circuiting in the `resource` concept
Fixes #653.

Adds three new workflows that can be manually triggered at various points
in the release process:

- `release-create-new`: Begin the release process for a new version.
  - Inputs:
    - `new_version`: The new version, eg. "2.3.1"
    - `branch_point`: Optional; If the `branch/{major}.{minor}.x` branch
      does not exist, create it from this SHA.
      If the release branch already exists, this is ignored.
      If not provided, the release branch is created from the current
      `main` branch.
  - Actions:
    - Creates release branch if needed.
    - Bumps version numbers with `update-version.sh` in topic branch.
    - Creates pull request to merge the topic branch into `main`
    - Marks the pull request for backporting to the release branch.
- `release-update-rc`: Validate and tag a new release candidate.
  - Inputs:
    - `new_version`: The new version, eg. "2.3.1"
  - Actions:
    - Uses the HEAD SHA of the release branch for testing/tagging.
    - Determines the next rc for this version by inspecting existing tags.
    - Runs the `pull_request` workflow to validate the release candidate.
      This can be modified in the future to run a special rc acceptance
      workflow.
    - Tags the release candidate if the workflow passes.
- `release-finalize`: Tag a final release.
  - Inputs:
    - `new_version`: The new version, eg. "2.3.1"
  - Actions:
    - Determines the most recent release candidate.
    - Tags the latest release candidate as the final release.

[skip-matrix][skip-rapids][skip-vdc][skip-docs]
- Generate tarballs
- Create draft release
- Get ver info from GITHUB_REF (must be RC tag)

[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
[skip-matrix][skip-rapids][skip-vdc][skip-docs]
* ensure cupy arrays can be used with cuda.parallel too

* copy the same comment to all places wherever applicable

Co-authored-by: Michael Schellenberger Costa <[email protected]>

* ensure all needed imports are shown in the example

* add CuPy (+CUDA 12.x) as a test dependency

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>
* Revert "Tweaks to slack notifs."

This reverts commit 0d475ef.

* Revert "Add slack notifications."

This reverts commit 9551a53.

* Revert "Use annotated tags."

This reverts commit 5cd738c.

* Revert "Log archive sizes."

This reverts commit 3ebd804.

* Revert "Fix typo."

This reverts commit 47fda9a.

* Revert "Add more useful display name to install preset."

This reverts commit cdd367e.

* Revert "Add overview doc for release workflows."

This reverts commit a6af3bf.

* Revert "Document workflow ref requirements/usage."

This reverts commit 448da79.

* Revert "Cleanup version handling in create-new."

This reverts commit 75c2500.

* Revert "Upload source packages, too."

This reverts commit dde8e2e.

* Revert "Verify the repo version in release-finalize."

This reverts commit c05951a.

* Revert "Simply release branch creation."

This reverts commit 4c7eacb.

* Revert "Refactor for consistency."

This reverts commit ed46ca8.

* Revert "Infer version in RC workflow from repo/git state."

This reverts commit ea07539.

* Revert "Add json file with version info that we can parse in CI."

This reverts commit 0beede5.

* Revert "Release workflow updates:"

This reverts commit 2873882.

* Revert "Add `install` preset."

This reverts commit 14d95a4.

* Revert "Add reusable workflow for updating version in branch with a PR"

This reverts commit 70a2872.

* Revert "Add release automation workflows."

This reverts commit 43eb66a.
* Drop `_LIBCUDACXX_THREAD_ABI_VISIBILITY`

its always defined as `_LIBCUDACXX_HIDE_FROM_ABI`

* Drop `_LIBCUDACXX_NO_THREAD_SAFETY_ANALYSIS`

Its never defined outside of `__FreeBSD__`

* Drop `thread_if`

* Drop `__libcpp_thread_favorite_barrier_index`

* Drop `_LIBCUDACXX_HAS_NO_THREAD_CONTENTION_TABLE`

It is always defined

* Drop `_LIBCUDACXX_HAS_NO_PLATFORM_WAIT`

It is always defined and only used once

* Drop `_LIBCUDACXX_BUILDING_THREAD_LIBRARY_EXTERNAL`

* Move macro definition out of function declaration

* Move threading_support

* Split into the different threading mechanisms

* Disentangle `_LIBCUDACXX_HAS_THREAD_API_EXTERNAL` with other backends

* Fix missing qualifiers and attributes

* Silence a ICC warning about `__libcpp_thread_id_equal`

* Drop more unused funtions from pthread

* Move to `__thread` subfolder
The `cccl_generate_install_rules` requires two arguments. When
`CCCL_TOPLEVEL_PROJECT` doesn't exist all the existing calls
will have `HEADERS_INCLUDE` or `NO_HEADERS` become the second
argument.
Strip prefix paths from cudax documentation
@miscco miscco closed this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.