[STF] Ensure algorithms with nested contexts use allocator adapters #3548

caugonnet · 2025-01-27T22:33:32Z

Description

Creating memory nodes in CUDA graph is very expensive, and caching executable graphs with memory nodes will leak memory. We therefore make our best to let the parent context based on CUDA streams deal with the allocations done in the graph_ctx internal to an "algorithm".

This PR should ensure that we do not create CUDA graph memory nodes, but use the allocator of the parent context instead for the "uncached allocations" .

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

…ithm

copy-pr-bot · 2025-01-27T22:33:35Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…le and to check clear() was called, factorize code to setup allocators in algorithms

caugonnet · 2025-01-28T12:49:26Z

/ok to test

github-actions · 2025-01-28T13:48:54Z

🟩 CI finished in 57m 24s: Pass: 100%/20 | Total: 3h 09m | Avg: 9m 27s | Max: 17m 16s | Hits: 388%/522

🟩 cudax: Pass: 100%/20 | Total: 3h 09m | Avg: 9m 27s | Max: 17m 16s | Hits: 388%/522

🟩 cpu
  🟩 amd64              Pass: 100%/16  | Total:  2h 37m | Avg:  9m 49s | Max: 17m 16s | Hits: 388%/522   
  🟩 arm64              Pass: 100%/4   | Total: 31m 45s | Avg:  7m 56s | Max:  8m 22s
🟩 ctk
  🟩 12.0               Pass: 100%/1   | Total:  9m 41s | Avg:  9m 41s | Max:  9m 41s | Hits: 388%/261   
  🟩 12.5               Pass: 100%/2   | Total: 11m 30s | Avg:  5m 45s | Max:  5m 49s
  🟩 12.6               Pass: 100%/17  | Total:  2h 47m | Avg:  9m 52s | Max: 17m 16s | Hits: 388%/261   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 41s | Avg:  9m 41s | Max:  9m 41s | Hits: 388%/261   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 30s | Avg:  5m 45s | Max:  5m 49s
  🟩 nvcc12.6           Pass: 100%/17  | Total:  2h 47m | Avg:  9m 52s | Max: 17m 16s | Hits: 388%/261   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/20  | Total:  3h 09m | Avg:  9m 27s | Max: 17m 16s | Hits: 388%/522   
🟩 cxx
  🟩 Clang14            Pass: 100%/1   | Total:  8m 30s | Avg:  8m 30s | Max:  8m 30s
  🟩 Clang15            Pass: 100%/1   | Total:  9m 27s | Avg:  9m 27s | Max:  9m 27s
  🟩 Clang16            Pass: 100%/1   | Total:  9m 20s | Avg:  9m 20s | Max:  9m 20s
  🟩 Clang17            Pass: 100%/1   | Total:  9m 16s | Avg:  9m 16s | Max:  9m 16s
  🟩 Clang18            Pass: 100%/4   | Total: 41m 47s | Avg: 10m 26s | Max: 16m 37s
  🟩 GCC10              Pass: 100%/1   | Total:  8m 59s | Avg:  8m 59s | Max:  8m 59s
  🟩 GCC11              Pass: 100%/1   | Total:  9m 44s | Avg:  9m 44s | Max:  9m 44s
  🟩 GCC12              Pass: 100%/2   | Total: 28m 08s | Avg: 14m 04s | Max: 17m 16s
  🟩 GCC13              Pass: 100%/4   | Total: 30m 07s | Avg:  7m 31s | Max:  8m 22s
  🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 41s | Avg:  9m 41s | Max:  9m 41s | Hits: 388%/261   
  🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 32s | Avg: 12m 32s | Max: 12m 32s | Hits: 388%/261   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 30s | Avg:  5m 45s | Max:  5m 49s
🟩 cxx_family
  🟩 Clang              Pass: 100%/8   | Total:  1h 18m | Avg:  9m 47s | Max: 16m 37s
  🟩 GCC                Pass: 100%/8   | Total:  1h 16m | Avg:  9m 37s | Max: 17m 16s
  🟩 MSVC               Pass: 100%/2   | Total: 22m 13s | Avg: 11m 06s | Max: 12m 32s | Hits: 388%/522   
  🟩 NVHPC              Pass: 100%/2   | Total: 11m 30s | Avg:  5m 45s | Max:  5m 49s
🟩 gpu
  🟩 v100               Pass: 100%/20  | Total:  3h 09m | Avg:  9m 27s | Max: 17m 16s | Hits: 388%/522   
🟩 jobs
  🟩 Build              Pass: 100%/18  | Total:  2h 35m | Avg:  8m 37s | Max: 12m 32s | Hits: 388%/522   
  🟩 Test               Pass: 100%/2   | Total: 33m 53s | Avg: 16m 56s | Max: 17m 16s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  6m 31s | Avg:  6m 31s | Max:  6m 31s
  🟩 90a                Pass: 100%/1   | Total:  7m 19s | Avg:  7m 19s | Max:  7m 19s
🟩 std
  🟩 17                 Pass: 100%/4   | Total: 27m 35s | Avg:  6m 53s | Max:  7m 55s
  🟩 20                 Pass: 100%/16  | Total:  2h 41m | Avg: 10m 05s | Max: 17m 16s | Hits: 388%/522

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 20)

#	Runner
12	`linux-amd64-cpu16`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`
2	`linux-amd64-gpu-v100-latest-1`

caugonnet and others added 2 commits January 26, 2025 13:38

Use a stream_adapter for the allocator of the inner graph in an algor…

8e5a436

…ithm

Merge branch 'NVIDIA:main' into stf_algorithm_alloc_adapter

0656dde

caugonnet self-assigned this Jan 27, 2025

caugonnet added the stf Sequential Task Flow programming model label Jan 27, 2025

miscco approved these changes Jan 28, 2025

View reviewed changes

caugonnet and others added 2 commits January 28, 2025 10:08

Merge branch 'main' into stf_algorithm_alloc_adapter

6675af2

Entirely rework the stream_adapter implementation so that it is movab…

b1b9c62

…le and to check clear() was called, factorize code to setup allocators in algorithms

Add a new tests and some comments

ed1f244

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[STF] Ensure algorithms with nested contexts use allocator adapters #3548

[STF] Ensure algorithms with nested contexts use allocator adapters #3548

caugonnet commented Jan 27, 2025 •

edited

Loading

copy-pr-bot bot commented Jan 27, 2025

caugonnet commented Jan 28, 2025

github-actions bot commented Jan 28, 2025

🟩 cudax: Pass: 100%/20 | Total: 3h 09m | Avg: 9m 27s | Max: 17m 16s | Hits: 388%/522

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 20)

[STF] Ensure algorithms with nested contexts use allocator adapters #3548

Are you sure you want to change the base?

[STF] Ensure algorithms with nested contexts use allocator adapters #3548

Conversation

caugonnet commented Jan 27, 2025 • edited Loading

Description

Checklist

copy-pr-bot bot commented Jan 27, 2025

caugonnet commented Jan 28, 2025

github-actions bot commented Jan 28, 2025

🟩 cudax: Pass: 100%/20 | Total: 3h 09m | Avg: 9m 27s | Max: 17m 16s | Hits: 388%/522

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 20)

caugonnet commented Jan 27, 2025 •

edited

Loading