[RELEASE] cudf v25.02 #17893

AyodeAwe · 2025-01-31T21:39:13Z

❄️ Code freeze for `branch-25.02` and v25.02 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-25.02 until release (merging of this PR).

What is the purpose of this PR?

Update documentation
Allow testing for the new release
Enable a means to merge branch-25.02 into main for the release

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17506

Replaces usage of `cudf::string_view::find()` with loop and call to `cudf::string_view::compare()` where possible. This showed significant performance improvement. This was also slightly faster than a KMP prototype implementation. Also updates the find/contains benchmarks to remove the 2GB limit and include column versions of the find APIs. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Basit Ayantunde (https://github.com/lamarrr) - Bradley Dice (https://github.com/bdice) URL: #17330

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17408

…olumn.from_libcudf` (#17517) Apart of #15162. In a follow-up PR we'll deprecate the cudf python column APIs and others that are used outside cudf. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17517

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17430

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #17466

Follow up of #17263, this PR adds the parquet reader options classes to pylibcudf and plumbs the changes through cudf python. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) - MithunR (https://github.com/mythrocks) URL: #17464

…7348) Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Nghia Truong (https://github.com/ttnghia) - Basit Ayantunde (https://github.com/lamarrr) - David Wendt (https://github.com/davidwendt) URL: #17348

We require a newer cuda-python lower bound for new features and to use the new layout. This will fix a number of errors observed when the runtime version of cuda-python is older than the version used to build packages using Cython features from cuda-python. See rapidsai/build-planning#117 (comment) for details. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) URL: #17547

#17540) Replaces the `cub::WarpReduce` usage in `cudf::strings::contains` with cooperative-groups `any()`. The change is only for the `contains_warp_parallel` kernel which is used for wider strings. Using cooperative-groups generates more efficient code for the same results and gives an additional 11-14% performance improvement. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) - Shruti Shivakumar (https://github.com/shrshi) URL: #17540

Removes unused IO utilities from cuDF Python. Depends on #17163 #16042 #17252 #17263 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17374

nvcc does not support `constexpr` functions that are not well-defined to call from the device. This is UB even when the function is not called from the device. Throwing an exception is one such operation. This PR cleans up error handling for functions that are called from device, and removes `constexpr` from the ones that are not actually used from the device, or in the constexpr context. Authors: - Vukasin Milovanovic (https://github.com/vuule) - MithunR (https://github.com/mythrocks) Approvers: - Karthikeyan (https://github.com/karthikeyann) - MithunR (https://github.com/mythrocks) - David Wendt (https://github.com/davidwendt) - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) - Mike Wilson (https://github.com/hyperbolic2346) - Yunsong Wang (https://github.com/PointKernel) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17534

Adds more description to the `div_rounding_up_safe` utility identifying undefined behavior. Closes #17539 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Paul Mattione (https://github.com/pmattione-nvidia) - Lawrence Mitchell (https://github.com/wence-) - Nghia Truong (https://github.com/ttnghia) URL: #17542

This PR adds a `CUDA_ASYNC_FABRIC` allocation mode in `RmmAllocationMode` and pipes in the options to RMM's `cuda_async_memory_resource` of a `fabric` for the handle type, and `read_write` as the memory protection mode (as that's the only mode supported by the pools, and is required for IPC). If `CUDA_ASYNC` is used, fabric handles are not requested, and the memory protection is `none`. Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - Nghia Truong (https://github.com/ttnghia) - Jason Lowe (https://github.com/jlowe) URL: #17526

…17496) Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17496

This is a follow up from #17526, where fabric handles can be enabled from RMM. That PR also sets the memory access protection flag (`cudaMemPoolSetAccess`), but I have learned that this second flag is not needed from the owner device. In fact, it causes confusion because the owning device fails to call this function with some of the flags (access none). `cudaMemPoolSetAccess` is meant to only be called from peer processes that have imported the pool's handle. In our case, UCX handles this from the peer's side and it does not need to be anywhere in RMM or cuDF. Sorry for the noise. I'd like to get this fix in, and then I am going to fix RMM by removing that API. Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) - Jason Lowe (https://github.com/jlowe) URL: #17553

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17505

Follow up to #17506. This PR removes an unused buffer class. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17549

Authors: - Hirota Akio (https://github.com/a-hirota) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17332

Due to 2 of my cudf._lib refactoring PRs going in which then impacted formatting of `cudf/_lib/CMakeLists.txt` Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17559

…fset types. (#17527) Follow-up for #17523 to use `target_compile_definitions` and drop the Thrust patch. Authors: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17527

Because of the switch away from certificates/mTLS, we are having to rework a few things. In the meantime, telemetry jobs are failing. This PR adds a switch to turn all of the telemetry stuff off - to skip it instead. It is meant to be controlled by an org-wide environment variable, which can be applied to individual repos by ops. At the time of submitting this PR, the environment variable is 'false' and no telemetry is being reported. Authors: - Mike Sarahan (https://github.com/msarahan) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17551

…17520) Replaces the custom kernels for `cudf::detail::copy_if` with a call to `thrust::copy_if` to build indices to call `cudf::detail::gather`. This is easier to maintain and faster for some cases but slower in others. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) URL: #17520

This PR replaces cudf's logger implementation with one generated using https://github.com/rapidsai/rapids-logger. This approach allows us to centralize the logger definition across different RAPIDS projects while allowing each project to vendor its own copy with a suitable set of macros and default logger objects. The common logger also takes care of handling the more complex packaging problems around ensuring that we fully isolate our spdlog dependency and do not leak any of its symbols, allowing our libraries to be safely installed in a much broader set of environments. Contributes to rapidsai/build-planning#104. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) URL: #17307

…distinct (#17546) This PR addresses several minor issues discovered while working on #17467: - Corrected a typo where `RowHasher` should have been `RowEqual` - Renamed `hash_set_type` to `distinct_set_t` - Added a `null_probability` benchmark axis for the distinct benchmark, similar to other stream compaction benchmarks Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17546

Update version references in breaking-change trigger workflow

Closes #17502 **Background Info**: The cudf and pandas `axis` defaults are different, and the upstream dask-expr `clip` APIs are consistent with the behavior of Pandas (not cudf). Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) - Matthew Murray (https://github.com/Matt711) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17509

Renames `minhash_permuted()` to `minhash()` and deprecates `minhash_permuted` Also removes the `word_minhash` APIs deprecated in 24.12. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Matthew Murray (https://github.com/Matt711) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17421

## Description #17307 broke builds that use the rapids-cmake pinned dependencies feature since no version was specified for the rapids_logger dependency. This adds a version string equal to the git tag so the dependency has a stated version. ## Checklist - [X] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [ ] New or existing tests cover these changes. - [X] The documentation is up to date with these changes. --------- Co-authored-by: Nghia Truong <[email protected]> Co-authored-by: Vyas Ramasubramani <[email protected]> Co-authored-by: Bradley Dice <[email protected]>

copy-pr-bot · 2025-01-31T21:39:16Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

review-notebook-app · 2025-01-31T21:39:21Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

## Description This PR fixes cudf ci nightly test failures: https://github.com/rapidsai/cudf/actions/runs/13097249137/job/36541039646 ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.

`data` attribute of numpy should be marked private as it actually points to the underlying memory and it will be distinct for a cupy array. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17890

Fixes #17836 Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17837

Fixes incorrect pylibcudf/libcudf example created in #17803. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17912

Currently pylibcudf does not export a dependency on libcudf at all, which is incorrect. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Bradley Dice (https://github.com/bdice) - James Lamb (https://github.com/jameslamb) URL: #17915

…7945) ibis has removed their pandas backend in version 10.0.0. In their release notes, > pandas: The pandas backend is removed. Note that pandas DataFrames are STILL VALID INPUTS AND OUTPUTS and will remain so for the foreseeable future. Please use one of the other local backends like DuckDB, Polars, or DataFusion to perform operations directly on pandas DataFrames. This PR removes the pandas backend from the integration tests. And asserts that the inputs and outputs to ibis APIs are proxy objects.

Follows up #17972. This PR is intended to get 25.02 nightly CI passing, which has been failing for few days.

Closes #17949 Closes #17960 Derived span classes use `size_type` for the index type in their `operator[]` implementations. The intent was to use `base::size_type`, but the type actually resolves to `cudf::size_type`, which is `int32_t`, and does not allow access past `int32_t::max`. This PR fixes the type used by explicitly using `typename base::size_type`. Also added static_asserts to make sure the type has the right size for element indexing.

Replace #17976 This fixes the race check failures in shared memory groupby and resolves NVIDIA/spark-rapids#11835.

Fix forward merge 24.12->25.02

Matt711 and others added 30 commits December 6, 2024 12:38

Plumb pylibcudf.io.parquet options classes through cudf python (#17506)

169a45a

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17506

Remove cudf._lib.text in favor of inlining pylibcudf (#17408)

c791f80

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17408

Remove cudf._lib.round in favor of inlining pylibcudf (#17430)

1a62b46

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17430

Remove cudf._lib.orc in favor of inlining pylibcudf (#17466)

b6f7e6e

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #17466

Remove unused IO utilities from cudf python (#17374)

0f5d4b9

Removes unused IO utilities from cuDF Python. Depends on #17163 #16042 #17252 #17263 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17374

Remove cudf._lib.string.convert/split in favor of inlining pylibcudf (#…

f595592

…17496) Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17496

Remove cudf._lib.transform in favor of inlining pylibcudf (#17505)

9df95d1

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17505

Remove unused BufferArrayFromVector (#17549)

ebad043

Follow up to #17506. This PR removes an unused buffer class. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17549

Enable rounding for Decimal32 and Decimal64 in cuDF (#17332)

4764395

Authors: - Hirota Akio (https://github.com/a-hirota) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17332

Merge branch-24.12 into branch-25.02

f904a7f

Update version references in workflow (#17568)

be62ea6

Update version references in breaking-change trigger workflow

AyodeAwe requested a review from msarahan January 31, 2025 21:39

AyodeAwe requested review from mroeschke, galipremsagar, kingcrimsontianyu and davidwendt January 31, 2025 21:39

github-actions bot assigned AyodeAwe Jan 31, 2025

github-actions bot assigned raydouglass Feb 3, 2025

galipremsagar and others added 5 commits February 3, 2025 21:36

Require batches to be non-empty in multi-batch JSON reader (#17837)

8b89ea0

Fixes #17836 Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17837

Fix incorrect example in pylibcudf docs (#17912)

2bada0d

Fixes incorrect pylibcudf/libcudf example created in #17803. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17912

ttnghia approved these changes Feb 10, 2025

View reviewed changes

Matt711 and others added 6 commits February 10, 2025 16:49

Pin ibis version in the cudf.pandas integration tests <10.0.0 (#17975)

d1a5558

Follows up #17972. This PR is intended to get 25.02 nightly CI passing, which has been failing for few days.

Fix race check failures in shared memory groupby (#17985)

4b2ce98

Replace #17976 This fixes the race check failures in shared memory groupby and resolves NVIDIA/spark-rapids#11835.

Update Changelog [skip ci]

c86ff6e

Merge branch-24.12 into branch-25.02 [skip ci]

a4e1df8

Merge pull request #18002 from raydouglass/branch-25.02-merge-24.12

12176f3

Fix forward merge 24.12->25.02

AyodeAwe merged commit 1dff899 into main Feb 13, 2025
29 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELEASE] cudf v25.02 #17893

[RELEASE] cudf v25.02 #17893

AyodeAwe commented Jan 31, 2025

copy-pr-bot bot commented Jan 31, 2025

review-notebook-app bot commented Jan 31, 2025

[RELEASE] cudf v25.02 #17893

[RELEASE] cudf v25.02 #17893

Conversation

AyodeAwe commented Jan 31, 2025

❄️ Code freeze for branch-25.02 and v25.02 release

What does this mean?

What is the purpose of this PR?

copy-pr-bot bot commented Jan 31, 2025

review-notebook-app bot commented Jan 31, 2025

❄️ Code freeze for `branch-25.02` and v25.02 release