Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STF] Implement kernel chains in the graph backend without child graphs #3707

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

caugonnet
Copy link
Contributor

@caugonnet caugonnet commented Feb 6, 2025

The implementation of kernel chains currently relies on child graphs, which is not the most efficient way to do this.

Otherwise, we need to create the child graph, fill it, then add the child graph which copies the nodes into the actual graph, and finally destroy the child graph. Instead we now have a vector of nodes, and simply add dependencies on the first and last nodes.

Description

closes

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Feb 6, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@caugonnet caugonnet self-assigned this Feb 6, 2025
@caugonnet caugonnet added the stf Sequential Task Flow programming model label Feb 6, 2025
@caugonnet
Copy link
Contributor Author

/ok to test

@caugonnet
Copy link
Contributor Author

/ok to test

1 similar comment
@caugonnet
Copy link
Contributor Author

/ok to test

@caugonnet
Copy link
Contributor Author

/ok to test

@caugonnet caugonnet marked this pull request as ready for review February 6, 2025 11:48
@caugonnet caugonnet requested a review from a team as a code owner February 6, 2025 11:48
@caugonnet caugonnet requested a review from pciolkosz February 6, 2025 11:48
Copy link
Contributor

github-actions bot commented Feb 6, 2025

🟩 CI finished in 38m 43s: Pass: 100%/20 | Total: 4h 01m | Avg: 12m 04s | Max: 15m 21s | Hits: 68%/10080
  • 🟩 cudax: Pass: 100%/20 | Total: 4h 01m | Avg: 12m 04s | Max: 15m 21s | Hits: 68%/10080

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  3h 13m | Avg: 12m 06s | Max: 15m 21s | Hits:  69%/7868  
      🟩 arm64              Pass: 100%/4   | Total: 47m 50s | Avg: 11m 57s | Max: 13m 08s | Hits:  62%/2212  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 17s | Avg:  9m 17s | Max:  9m 17s | Hits:  60%/261   
      🟩 12.5               Pass: 100%/2   | Total: 14m 31s | Avg:  7m 15s | Max:  7m 28s | Hits:  87%/706   
      🟩 12.8               Pass: 100%/17  | Total:  3h 37m | Avg: 12m 48s | Max: 15m 21s | Hits:  66%/9113  
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 17s | Avg:  9m 17s | Max:  9m 17s | Hits:  60%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 14m 31s | Avg:  7m 15s | Max:  7m 28s | Hits:  87%/706   
      🟩 nvcc12.8           Pass: 100%/17  | Total:  3h 37m | Avg: 12m 48s | Max: 15m 21s | Hits:  66%/9113  
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  4h 01m | Avg: 12m 04s | Max: 15m 21s | Hits:  68%/10080 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total: 13m 54s | Avg: 13m 54s | Max: 13m 54s | Hits:  62%/555   
      🟩 Clang15            Pass: 100%/1   | Total: 14m 43s | Avg: 14m 43s | Max: 14m 43s | Hits:  62%/553   
      🟩 Clang16            Pass: 100%/1   | Total: 15m 05s | Avg: 15m 05s | Max: 15m 05s | Hits:  62%/553   
      🟩 Clang17            Pass: 100%/1   | Total: 14m 07s | Avg: 14m 07s | Max: 14m 07s | Hits:  62%/553   
      🟩 Clang18            Pass: 100%/4   | Total: 48m 06s | Avg: 12m 01s | Max: 13m 19s | Hits:  71%/2212  
      🟩 GCC10              Pass: 100%/1   | Total: 13m 35s | Avg: 13m 35s | Max: 13m 35s | Hits:  62%/555   
      🟩 GCC11              Pass: 100%/1   | Total: 15m 10s | Avg: 15m 10s | Max: 15m 10s | Hits:  62%/553   
      🟩 GCC12              Pass: 100%/2   | Total: 27m 41s | Avg: 13m 50s | Max: 15m 21s | Hits:  80%/1106  
      🟩 GCC13              Pass: 100%/4   | Total: 45m 45s | Avg: 11m 26s | Max: 13m 08s | Hits:  62%/2212  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 17s | Avg:  9m 17s | Max:  9m 17s | Hits:  60%/261   
      🟩 MSVC14.42          Pass: 100%/1   | Total:  9m 34s | Avg:  9m 34s | Max:  9m 34s | Hits:  60%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 14m 31s | Avg:  7m 15s | Max:  7m 28s | Hits:  87%/706   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total:  1h 45m | Avg: 13m 14s | Max: 15m 05s | Hits:  67%/4426  
      🟩 GCC                Pass: 100%/8   | Total:  1h 42m | Avg: 12m 46s | Max: 15m 21s | Hits:  66%/4426  
      🟩 MSVC               Pass: 100%/2   | Total: 18m 51s | Avg:  9m 25s | Max:  9m 34s | Hits:  60%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 14m 31s | Avg:  7m 15s | Max:  7m 28s | Hits:  87%/706   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/20  | Total:  4h 01m | Avg: 12m 04s | Max: 15m 21s | Hits:  68%/10080 
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  3h 37m | Avg: 12m 04s | Max: 15m 21s | Hits:  64%/8974  
      🟩 Test               Pass: 100%/2   | Total: 24m 07s | Avg: 12m 03s | Max: 12m 20s | Hits:  99%/1106  
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  9m 38s | Avg:  9m 38s | Max:  9m 38s | Hits:  62%/553   
      🟩 90a                Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s | Hits:  62%/553   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 39m 30s | Avg:  9m 52s | Max: 11m 42s | Hits:  66%/2012  
      🟩 20                 Pass: 100%/16  | Total:  3h 21m | Avg: 12m 37s | Max: 15m 21s | Hits:  68%/8068  
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-rtx2080-latest-1

@caugonnet
Copy link
Contributor Author

/ok to test

Copy link
Contributor

github-actions bot commented Feb 6, 2025

🟩 CI finished in 31m 35s: Pass: 100%/20 | Total: 3h 27m | Avg: 10m 23s | Max: 16m 19s | Hits: 78%/10080
  • 🟩 cudax: Pass: 100%/20 | Total: 3h 27m | Avg: 10m 23s | Max: 16m 19s | Hits: 78%/10080

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  2h 49m | Avg: 10m 36s | Max: 16m 19s | Hits:  79%/7868  
      🟩 arm64              Pass: 100%/4   | Total: 38m 13s | Avg:  9m 33s | Max: 10m 40s | Hits:  74%/2212  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s | Hits:  61%/261   
      🟩 12.5               Pass: 100%/2   | Total: 11m 03s | Avg:  5m 31s | Max:  5m 39s | Hits:  95%/706   
      🟩 12.8               Pass: 100%/17  | Total:  3h 06m | Avg: 10m 57s | Max: 16m 19s | Hits:  77%/9113  
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s | Hits:  61%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 03s | Avg:  5m 31s | Max:  5m 39s | Hits:  95%/706   
      🟩 nvcc12.8           Pass: 100%/17  | Total:  3h 06m | Avg: 10m 57s | Max: 16m 19s | Hits:  77%/9113  
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  3h 27m | Avg: 10m 23s | Max: 16m 19s | Hits:  78%/10080 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total: 10m 13s | Avg: 10m 13s | Max: 10m 13s | Hits:  75%/555   
      🟩 Clang15            Pass: 100%/1   | Total: 12m 23s | Avg: 12m 23s | Max: 12m 23s | Hits:  75%/553   
      🟩 Clang16            Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s | Hits:  75%/553   
      🟩 Clang17            Pass: 100%/1   | Total: 11m 04s | Avg: 11m 04s | Max: 11m 04s | Hits:  75%/553   
      🟩 Clang18            Pass: 100%/4   | Total: 45m 30s | Avg: 11m 22s | Max: 16m 19s | Hits:  81%/2212  
      🟩 GCC10              Pass: 100%/1   | Total: 11m 55s | Avg: 11m 55s | Max: 11m 55s | Hits:  74%/555   
      🟩 GCC11              Pass: 100%/1   | Total: 11m 21s | Avg: 11m 21s | Max: 11m 21s | Hits:  74%/553   
      🟩 GCC12              Pass: 100%/2   | Total: 24m 37s | Avg: 12m 18s | Max: 12m 19s | Hits:  87%/1106  
      🟩 GCC13              Pass: 100%/4   | Total: 37m 07s | Avg:  9m 16s | Max: 10m 40s | Hits:  74%/2212  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s | Hits:  61%/261   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 10m 31s | Avg: 10m 31s | Max: 10m 31s | Hits:  61%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 03s | Avg:  5m 31s | Max:  5m 39s | Hits:  95%/706   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total:  1h 30m | Avg: 11m 21s | Max: 16m 19s | Hits:  78%/4426  
      🟩 GCC                Pass: 100%/8   | Total:  1h 25m | Avg: 10m 37s | Max: 12m 19s | Hits:  77%/4426  
      🟩 MSVC               Pass: 100%/2   | Total: 20m 59s | Avg: 10m 29s | Max: 10m 31s | Hits:  61%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 03s | Avg:  5m 31s | Max:  5m 39s | Hits:  95%/706   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/20  | Total:  3h 27m | Avg: 10m 23s | Max: 16m 19s | Hits:  78%/10080 
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  2h 59m | Avg:  9m 57s | Max: 12m 23s | Hits:  75%/8974  
      🟩 Test               Pass: 100%/2   | Total: 28m 37s | Avg: 14m 18s | Max: 16m 19s | Hits:  99%/1106  
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  8m 22s | Avg:  8m 22s | Max:  8m 22s | Hits:  74%/553   
      🟩 90a                Pass: 100%/1   | Total:  9m 02s | Avg:  9m 02s | Max:  9m 02s | Hits:  74%/553   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 31m 42s | Avg:  7m 55s | Max:  9m 03s | Hits:  78%/2012  
      🟩 20                 Pass: 100%/16  | Total:  2h 56m | Avg: 11m 00s | Max: 16m 19s | Hits:  78%/8068  
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-rtx2080-latest-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stf Sequential Task Flow programming model
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

1 participant