Make constant memory opt-in, spill large statics to global memory #217

LegNeato · 2025-05-26T02:15:53Z

I've decided to default to not using constant memory with an opt-in flag. Later when we are smarter we can flip the flag by default and/or make it a no-op. One can still annotate code with #[cuda_std::address_space(constant)] to place it manually., this flag only affects automatic placement by the codegen backend.

Using this flag / turning on constant memory can blow up as constant memory placing logic isn't fully correct. Ideally we keep track of what we have put into constant memory and when it is filled up spill instead of only spilling when a static is too large on its own. We'll also probably want some packing strategy controlled by the user...for example, if you have one large static and many small ones, you might want the small ones to all be in constant memory or just the big one depending on your workload. We need some design work around this, and the design shouldn't require code to be annotated to support third party non-GPU-aware libraries.

But, this is materially better than what is there (which is a runtime error).

Fixes #208.

See also the debugging and discussion in
#216

This looks like it was code that wasn't deleted after the refactor in Rust-GPU@decda87

This isn't fully correct, as ideally we keep track of what we have put into constant memory and when it is filled up spill instdead of only spilling when a static is big. But, this is materially better than what is there (which is a runtime error). An argument can be made to just _always_ use global memory and we don't have to worry about getting the packing right. Fixes Rust-GPU#208. See also the debugging and discussion in Rust-GPU#216

crates/rustc_codegen_nvvm/src/context.rs

LegNeato · 2025-05-26T17:02:03Z

I've decided to default to not using constant memory with an opt-in flag. Later when we are smarter we can flip the flag by default and/or make it a no-op. I tested this with your vast script (thanks again!) and it should work...I was playing around with putting everything in globals previously and that must have snuck in when testing the previous commits!

brandonros · 2025-05-26T17:32:29Z

let me give it one final test and we can move on to the next one because i think they will be quick (if you have time that is)

sha2: #207 entire thing won't compile with a trap, i bet it is something small silly (potentially on my end) and this one may actually make all of this worth while, improving performance on this

(not sure if you saw but... 100,000,000 hashes/sec on a RTX5090 is even better than I could come up with native https://github.com/brandonros/vanity_finder_cpp )

rand_xoshiro : #203

there's a chhhhhannnnce this ed25519 problem was related and it'll fix it but i doubt it, i also think it'll be something small

brandonros · 2025-05-26T17:33:40Z

I tested this with your vast script (thanks again!)

we could hypothetically make a manual github actions trigger with a Vast API token as a secret that can stand up a $0.15/hr instance, do CI against it (tests/examples/whatever) and then tear it down

I'd be willing to put some cycles into that if you'd like but I'm not sure if the benefit is there for you

brandonros · 2025-05-26T17:34:42Z

crates/rustc_codegen_nvvm/src/context.rs

-            AddressSpace(4)
+            if !self.codegen_args.use_constant_memory_space {
+                // We aren't using constant memory, so put the instance in global memory.
+                AddressSpace(1)


nit: could we make consts somewhere that represent this 0 1 2 3 4 stuff better for easier readability?

Yeah, we have one in cuda_std and I didn't want to duplicate it. A followup should add a rustc_codegen_nvvm-types crate that cuda_std and rustc_codegen_nvvm could share (rust-gpu has a "-types" crate for just this reason). Irust-gpu also has rspirv for spirv-specific info encoded in rust types, so perhaps it should be something like rcuda? we could move the cuda error code mapping out of cust into it as well 🤔 )

brandonros

it works

LegNeato · 2025-05-26T17:56:23Z

I tested this with your vast script (thanks again!)

we could hypothetically make a manual github actions trigger with a Vast API token as a secret that can stand up a $0.15/hr instance, do CI against it (tests/examples/whatever) and then tear it down

I'd be willing to put some cycles into that if you'd like but I'm not sure if the benefit is there for you

We have GPUs sponsored by modal.com. I just haven't got a chance to get it all working (they didn't have simple ssh access, but looks like maybe they do now? #202

brandonros · 2025-05-26T17:58:52Z

I tested this with your vast script (thanks again!)

we could hypothetically make a manual github actions trigger with a Vast API token as a secret that can stand up a $0.15/hr instance, do CI against it (tests/examples/whatever) and then tear it down
I'd be willing to put some cycles into that if you'd like but I'm not sure if the benefit is there for you

We have GPUs sponsored by modal.com. I just haven't got a chance to get it all working (they didn't have simple ssh access, but looks like maybe they do now? #202

ok. not sure if you saw: https://github.com/brandonros/ed25519-vanity-rs/blob/master/.github/workflows/ci.yaml

we can build on runner (as you know and also have) and then run. i could help split the Vast script to not have build if needed

let's merge if you're ready, this is awesome. thanks again, hope you enjoyed working together so far. i'll retest the smaller simpler Xorshiro RNG issue after this, i have a feeling it might be fixed with this

LegNeato · 2025-05-26T18:02:20Z

I tested this with your vast script (thanks again!)

we could hypothetically make a manual github actions trigger with a Vast API token as a secret that can stand up a $0.15/hr instance, do CI against it (tests/examples/whatever) and then tear it down
I'd be willing to put some cycles into that if you'd like but I'm not sure if the benefit is there for you

We have GPUs sponsored by modal.com. I just haven't got a chance to get it all working (they didn't have simple ssh access, but looks like maybe they do now? #202

ok. not sure if you saw: https://github.com/brandonros/ed25519-vanity-rs/blob/master/.github/workflows/ci.yaml

we can build on runner (as you know and also have) and then run. i could help split the Vast script to not have build if needed

let's merge if you're ready, this is awesome. thanks again, hope you enjoyed working together so far. i'll retest the smaller simpler Xorshiro RNG issue after this, i have a feeling it might be fixed with this

Yeah, we build containers as well:

https://github.com/Rust-GPU/Rust-CUDA/blob/main/.github/workflows/container_images.yml

That is the idea for the modal stuff...build on actions, push up to modal, and run.

LegNeato · 2025-05-26T19:01:03Z

#218 for improving usage of constant memory space.

LegNeato added 2 commits May 25, 2025 22:06

Allow address spaces to propagate to LLVM

e98fec3

This looks like it was code that wasn't deleted after the refactor in Rust-GPU@decda87

LegNeato mentioned this pull request May 26, 2025

Fix read_volatile intrinsic #216

Open

brandonros reviewed May 26, 2025

View reviewed changes

crates/rustc_codegen_nvvm/src/context.rs Outdated Show resolved Hide resolved

LegNeato force-pushed the staticaddr branch from a283db5 to 6c4125b Compare May 26, 2025 16:50

Add --use-constant-memory-space flag, off by default

81936e1

LegNeato force-pushed the staticaddr branch from 6c4125b to 81936e1 Compare May 26, 2025 16:55

LegNeato changed the title ~~Spill large statics to global memory~~ Make constant memory opt-in, spill large statics to global memory May 26, 2025

Make it clear that #[cuda_std::address_space(constant)] still works

5f81c65

LegNeato force-pushed the staticaddr branch from 60b0286 to 5f81c65 Compare May 26, 2025 17:23

brandonros reviewed May 26, 2025

View reviewed changes

brandonros approved these changes May 26, 2025

View reviewed changes

LegNeato merged commit afb147e into Rust-GPU:main May 26, 2025
7 checks passed

LegNeato mentioned this pull request May 26, 2025

Smart and user-controllable automatic Constant memory space placement for static variables #218

Open

brandonros mentioned this pull request May 26, 2025

rand_xoshiro crate = runtime error #203

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make constant memory opt-in, spill large statics to global memory #217

Make constant memory opt-in, spill large statics to global memory #217

Uh oh!

LegNeato commented May 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

LegNeato commented May 26, 2025

Uh oh!

brandonros commented May 26, 2025

Uh oh!

brandonros commented May 26, 2025

Uh oh!

brandonros May 26, 2025

Uh oh!

LegNeato May 26, 2025 •

edited

Loading

Uh oh!

brandonros left a comment

Uh oh!

LegNeato commented May 26, 2025

Uh oh!

brandonros commented May 26, 2025 •

edited

Loading

Uh oh!

LegNeato commented May 26, 2025

Uh oh!

Uh oh!

LegNeato commented May 26, 2025

Uh oh!

Uh oh!

Make constant memory opt-in, spill large statics to global memory #217

Make constant memory opt-in, spill large statics to global memory #217

Uh oh!

Conversation

LegNeato commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

LegNeato commented May 26, 2025

Uh oh!

brandonros commented May 26, 2025

Uh oh!

brandonros commented May 26, 2025

Uh oh!

brandonros May 26, 2025

Choose a reason for hiding this comment

Uh oh!

LegNeato May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandonros left a comment

Choose a reason for hiding this comment

Uh oh!

LegNeato commented May 26, 2025

Uh oh!

brandonros commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LegNeato commented May 26, 2025

Uh oh!

Uh oh!

LegNeato commented May 26, 2025

Uh oh!

Uh oh!

LegNeato commented May 26, 2025 •

edited

Loading

LegNeato May 26, 2025 •

edited

Loading

brandonros commented May 26, 2025 •

edited

Loading