[CUDAX] Introduce pinned memory pool and move pinned memory resource to use it on new CUDA versions #3975

pciolkosz · 2025-03-01T01:25:52Z

Draft, todo description

copy-pr-bot · 2025-03-01T01:25:55Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

pciolkosz · 2025-03-01T01:26:47Z

/ok to test

github-actions · 2025-03-01T01:53:14Z

🟥 CI finished in 25m 26s: Pass: 0%/22 | Total: 55m 39s | Avg: 2m 31s | Max: 8m 15s

🟥 cudax: Pass: 0%/22 | Total: 55m 39s | Avg: 2m 31s | Max: 8m 15s

🟥 cudacxx_family
  🟥 nvcc               Pass:   0%/22  | Total: 55m 39s | Avg:  2m 31s | Max:  8m 15s
🟥 cpu
  🟥 amd64              Pass:   0%/18  | Total: 48m 43s | Avg:  2m 42s | Max:  8m 15s
  🟥 arm64              Pass:   0%/4   | Total:  6m 56s | Avg:  1m 44s | Max:  1m 52s
🟥 ctk
  🟥 12.0               Pass:   0%/1   | Total:  8m 15s | Avg:  8m 15s | Max:  8m 15s
  🟥 12.5               Pass:   0%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 12s
  🟥 12.8               Pass:   0%/19  | Total: 39m 17s | Avg:  2m 04s | Max:  7m 54s
🟥 cudacxx
  🟥 nvcc12.0           Pass:   0%/1   | Total:  8m 15s | Avg:  8m 15s | Max:  8m 15s
  🟥 nvcc12.5           Pass:   0%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 12s
  🟥 nvcc12.8           Pass:   0%/19  | Total: 39m 17s | Avg:  2m 04s | Max:  7m 54s
🟥 cxx
  🟥 Clang14            Pass:   0%/1   | Total:  2m 17s | Avg:  2m 17s | Max:  2m 17s
  🟥 Clang15            Pass:   0%/1   | Total:  2m 16s | Avg:  2m 16s | Max:  2m 16s
  🟥 Clang16            Pass:   0%/1   | Total:  2m 18s | Avg:  2m 18s | Max:  2m 18s
  🟥 Clang17            Pass:   0%/1   | Total:  2m 18s | Avg:  2m 18s | Max:  2m 18s
  🟥 Clang18            Pass:   0%/4   | Total:  5m 56s | Avg:  1m 29s | Max:  2m 19s
  🟥 GCC10              Pass:   0%/1   | Total:  2m 10s | Avg:  2m 10s | Max:  2m 10s
  🟥 GCC11              Pass:   0%/1   | Total:  1m 53s | Avg:  1m 53s | Max:  1m 53s
  🟥 GCC12              Pass:   0%/2   | Total:  2m 13s | Avg:  1m 06s | Max:  2m 13s
  🟥 GCC13              Pass:   0%/6   | Total: 10m 02s | Avg:  1m 40s | Max:  2m 18s
  🟥 MSVC14.39          Pass:   0%/1   | Total:  8m 15s | Avg:  8m 15s | Max:  8m 15s
  🟥 MSVC14.42          Pass:   0%/1   | Total:  7m 54s | Avg:  7m 54s | Max:  7m 54s
  🟥 NVHPC24.7          Pass:   0%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 12s
🟥 cxx_family
  🟥 Clang              Pass:   0%/8   | Total: 15m 05s | Avg:  1m 53s | Max:  2m 19s
  🟥 GCC                Pass:   0%/10  | Total: 16m 18s | Avg:  1m 37s | Max:  2m 18s
  🟥 MSVC               Pass:   0%/2   | Total: 16m 09s | Avg:  8m 04s | Max:  8m 15s
  🟥 NVHPC              Pass:   0%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 12s
🟥 gpu
  🟥 h100               Pass:   0%/2   | Total:  2m 15s | Avg:  1m 07s | Max:  2m 15s
  🟥 rtx2080            Pass:   0%/20  | Total: 53m 24s | Avg:  2m 40s | Max:  8m 15s
🟥 jobs
  🟥 Build              Pass:   0%/19  | Total: 55m 39s | Avg:  2m 55s | Max:  8m 15s
  🟥 Test               Pass:   0%/3  
🟥 sm
  🟥 90                 Pass:   0%/3   | Total:  4m 33s | Avg:  1m 31s | Max:  2m 18s
  🟥 90a                Pass:   0%/1   | Total:  2m 10s | Avg:  2m 10s | Max:  2m 10s
🟥 std
  🟥 17                 Pass:   0%/4   | Total:  9m 55s | Avg:  2m 28s | Max:  4m 12s
  🟥 20                 Pass:   0%/18  | Total: 45m 44s | Avg:  2m 32s | Max:  8m 15s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 22)

#	Runner
13	`linux-amd64-cpu16`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`
2	`linux-amd64-gpu-rtx2080-latest-1`
1	`linux-amd64-gpu-h100-latest-1`

pciolkosz · 2025-03-01T21:43:16Z

/ok to test

github-actions · 2025-03-01T22:09:34Z

🟨 CI finished in 22m 53s: Pass: 77%/22 | Total: 2h 24m | Avg: 6m 35s | Max: 14m 26s | Hits: 86%/9499

🟨 cudax: Pass: 77%/22 | Total: 2h 24m | Avg: 6m 35s | Max: 14m 26s | Hits: 86%/9499

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  72%/18  | Total:  2h 07m | Avg:  7m 05s | Max: 14m 26s | Hits:  86%/7167  
  🟩 arm64              Pass: 100%/4   | Total: 17m 12s | Avg:  4m 18s | Max:  4m 34s | Hits:  87%/2332  
🔍 sm: 90 🔍
  🔍 90                 Pass:  66%/3   | Total: 23m 03s | Avg:  7m 41s | Max: 14m 26s | Hits:  86%/1166  
  🟩 90a                Pass: 100%/1   | Total:  4m 08s | Avg:  4m 08s | Max:  4m 08s | Hits:  87%/583   
🔍 std: 20 🔍
  🟩 17                 Pass: 100%/4   | Total: 19m 43s | Avg:  4m 55s | Max:  7m 01s | Hits:  84%/2124  
  🔍 20                 Pass:  72%/18  | Total:  2h 05m | Avg:  6m 57s | Max: 14m 26s | Hits:  86%/7375  
🟨 ctk
  🟥 12.0               Pass:   0%/1   | Total: 12m 12s | Avg: 12m 12s | Max: 12m 12s
  🟩 12.5               Pass: 100%/2   | Total: 13m 57s | Avg:  6m 58s | Max:  7m 01s | Hits:  76%/750   
  🟨 12.8               Pass:  78%/19  | Total:  1h 58m | Avg:  6m 15s | Max: 14m 26s | Hits:  87%/8749  
🟨 cudacxx
  🟥 nvcc12.0           Pass:   0%/1   | Total: 12m 12s | Avg: 12m 12s | Max: 12m 12s
  🟩 nvcc12.5           Pass: 100%/2   | Total: 13m 57s | Avg:  6m 58s | Max:  7m 01s | Hits:  76%/750   
  🟨 nvcc12.8           Pass:  78%/19  | Total:  1h 58m | Avg:  6m 15s | Max: 14m 26s | Hits:  87%/8749  
🟨 cxx
  🟩 Clang14            Pass: 100%/1   | Total:  4m 21s | Avg:  4m 21s | Max:  4m 21s | Hits:  87%/585   
  🟩 Clang15            Pass: 100%/1   | Total:  5m 01s | Avg:  5m 01s | Max:  5m 01s | Hits:  87%/583   
  🟩 Clang16            Pass: 100%/1   | Total:  4m 48s | Avg:  4m 48s | Max:  4m 48s | Hits:  87%/583   
  🟩 Clang17            Pass: 100%/1   | Total:  4m 40s | Avg:  4m 40s | Max:  4m 40s | Hits:  87%/583   
  🟨 Clang18            Pass:  75%/4   | Total: 25m 22s | Avg:  6m 20s | Max: 12m 11s | Hits:  87%/1749  
  🟩 GCC10              Pass: 100%/1   | Total:  4m 39s | Avg:  4m 39s | Max:  4m 39s | Hits:  87%/585   
  🟩 GCC11              Pass: 100%/1   | Total:  4m 52s | Avg:  4m 52s | Max:  4m 52s | Hits:  87%/583   
  🟨 GCC12              Pass:  50%/2   | Total: 17m 16s | Avg:  8m 38s | Max: 12m 29s | Hits:  87%/583   
  🟨 GCC13              Pass:  83%/6   | Total: 35m 53s | Avg:  5m 58s | Max: 14m 26s | Hits:  86%/2915  
  🟥 MSVC14.39          Pass:   0%/1   | Total: 12m 12s | Avg: 12m 12s | Max: 12m 12s
  🟥 MSVC14.42          Pass:   0%/1   | Total: 11m 55s | Avg: 11m 55s | Max: 11m 55s
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 13m 57s | Avg:  6m 58s | Max:  7m 01s | Hits:  76%/750   
🟨 cxx_family
  🟨 Clang              Pass:  87%/8   | Total: 44m 12s | Avg:  5m 31s | Max: 12m 11s | Hits:  87%/4083  
  🟨 GCC                Pass:  80%/10  | Total:  1h 02m | Avg:  6m 16s | Max: 14m 26s | Hits:  87%/4666  
  🟥 MSVC               Pass:   0%/2   | Total: 24m 07s | Avg: 12m 03s | Max: 12m 12s
  🟩 NVHPC              Pass: 100%/2   | Total: 13m 57s | Avg:  6m 58s | Max:  7m 01s | Hits:  76%/750   
🟨 cudacxx_family
  🟨 nvcc               Pass:  77%/22  | Total:  2h 24m | Avg:  6m 35s | Max: 14m 26s | Hits:  86%/9499  
🟨 gpu
  🟨 h100               Pass:  50%/2   | Total: 18m 40s | Avg:  9m 20s | Max: 14m 26s | Hits:  86%/583   
  🟨 rtx2080            Pass:  80%/20  | Total:  2h 06m | Avg:  6m 18s | Max: 12m 29s | Hits:  86%/8916  
🟨 jobs
  🟨 Build              Pass:  89%/19  | Total:  1h 45m | Avg:  5m 34s | Max: 12m 12s | Hits:  86%/9499  
  🟥 Test               Pass:   0%/3   | Total: 39m 06s | Avg: 13m 02s | Max: 14m 26s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 22)

#	Runner
13	`linux-amd64-cpu16`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`
2	`linux-amd64-gpu-rtx2080-latest-1`
1	`linux-amd64-gpu-h100-latest-1`

miscco

Cursory glance

miscco · 2025-03-02T09:12:28Z

cudax/examples/simple_p2p.cu

@@ -219,9 +219,9 @@ try

  printf("Enabling peer access between GPU%d and GPU%d...\n", peers[0].get(), peers[1].get());
  cudax::device_memory_resource dev0_resource(peers[0]);
-  dev0_resource.enable_peer_access_from(peers[1]);
+  dev0_resource.enable_access_from(peers[1]);


Can we move that rename into a separate PR?

miscco · 2025-03-02T09:12:53Z

cudax/include/cuda/experimental/__device/all_devices.cuh

+inline all_devices::operator ::std::vector<device_ref>() const
+{
+  return ::std::vector<device_ref>(begin(), end());
+}
+


Is there any benefit to not defining these functions inline?

miscco · 2025-03-02T09:14:17Z

cudax/include/cuda/experimental/__memory_resource/memory_pool_base.cuh

+//! @param __device_id The id of the device for which to query support.
+//! @throws cuda_error if \c cudaDeviceGetAttribute failed.
+//! @returns true if \c cudaDevAttrMemoryPoolsSupported is not zero.
+inline void __device_supports_stream_ordered_allocations(const int __device_id)


I believe I came up with that name, but its bad. Because this does not retunrs anything.

This should rather be prefixed with something like __check or __verify

miscco · 2025-03-02T09:14:51Z

cudax/include/cuda/experimental/__memory_resource/memory_pool_base.cuh

+      // Construct on NUMA node 0 only for now
+      __pool_properties.location.type = ::cudaMemLocationTypeHostNuma;
+      __pool_properties.location.id   = __id;
+#else


Please add comments to the conditional compilations

pciolkosz added 6 commits February 28, 2025 16:37

WIP

6b0f664

WIP

3fcc9bb

Pinned memory pool and resource using it

87bb634

Add pinned memory pool tests and simplify resource comparison

c917b60

Fix format and includes

4fb8385

Rename mempool tests

1d9495a

Fix tests cmake

08deafa

miscco reviewed Mar 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDAX] Introduce pinned memory pool and move pinned memory resource to use it on new CUDA versions #3975

[CUDAX] Introduce pinned memory pool and move pinned memory resource to use it on new CUDA versions #3975

pciolkosz commented Mar 1, 2025

copy-pr-bot bot commented Mar 1, 2025

pciolkosz commented Mar 1, 2025

github-actions bot commented Mar 1, 2025

🟥 cudax: Pass: 0%/22 | Total: 55m 39s | Avg: 2m 31s | Max: 8m 15s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 22)

pciolkosz commented Mar 1, 2025

github-actions bot commented Mar 1, 2025

🟨 cudax: Pass: 77%/22 | Total: 2h 24m | Avg: 6m 35s | Max: 14m 26s | Hits: 86%/9499

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 22)

miscco left a comment

miscco Mar 2, 2025

miscco Mar 2, 2025

miscco Mar 2, 2025

miscco Mar 2, 2025

[CUDAX] Introduce pinned memory pool and move pinned memory resource to use it on new CUDA versions #3975

Are you sure you want to change the base?

[CUDAX] Introduce pinned memory pool and move pinned memory resource to use it on new CUDA versions #3975

Conversation

pciolkosz commented Mar 1, 2025

copy-pr-bot bot commented Mar 1, 2025

pciolkosz commented Mar 1, 2025

github-actions bot commented Mar 1, 2025

🟥 cudax: Pass: 0%/22 | Total: 55m 39s | Avg: 2m 31s | Max: 8m 15s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 22)

pciolkosz commented Mar 1, 2025

github-actions bot commented Mar 1, 2025

🟨 cudax: Pass: 77%/22 | Total: 2h 24m | Avg: 6m 35s | Max: 14m 26s | Hits: 86%/9499

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 22)

miscco left a comment

Choose a reason for hiding this comment

miscco Mar 2, 2025

Choose a reason for hiding this comment

miscco Mar 2, 2025

Choose a reason for hiding this comment

miscco Mar 2, 2025

Choose a reason for hiding this comment

miscco Mar 2, 2025

Choose a reason for hiding this comment