Skip to content

Make constant memory opt-in, spill large statics to global memory #217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 26, 2025

Conversation

LegNeato
Copy link
Contributor

@LegNeato LegNeato commented May 26, 2025

I've decided to default to not using constant memory with an opt-in flag. Later when we are smarter we can flip the flag by default and/or make it a no-op. One can still annotate code with #[cuda_std::address_space(constant)] to place it manually., this flag only affects automatic placement by the codegen backend.

Using this flag / turning on constant memory can blow up as constant memory placing logic isn't fully correct. Ideally we keep track of what we have put into constant memory and when it is filled up spill instead of only spilling when a static is too large on its own. We'll also probably want some packing strategy controlled by the user...for example, if you have one large static and many small ones, you might want the small ones to all be in constant memory or just the big one depending on your workload. We need some design work around this, and the design shouldn't require code to be annotated to support third party non-GPU-aware libraries.

But, this is materially better than what is there (which is a runtime error).

Fixes #208.

See also the debugging and discussion in
#216

LegNeato added 2 commits May 25, 2025 22:06
This looks like it was code that wasn't deleted after the refactor in
Rust-GPU@decda87
This isn't fully correct, as ideally we keep track
of what we have put into constant memory and when
it is filled up spill instdead of only spilling when
a static is big. But, this is materially better than
what is there (which is a runtime error).

An argument can be made to just _always_ use global
memory and we don't have to worry about getting the
packing right.

Fixes Rust-GPU#208.

See also the debugging and discussion in
Rust-GPU#216
@LegNeato
Copy link
Contributor Author

I've decided to default to not using constant memory with an opt-in flag. Later when we are smarter we can flip the flag by default and/or make it a no-op. I tested this with your vast script (thanks again!) and it should work...I was playing around with putting everything in globals previously and that must have snuck in when testing the previous commits!

@LegNeato LegNeato changed the title Spill large statics to global memory Make constant memory opt-in, spill large statics to global memory May 26, 2025
@brandonros
Copy link

let me give it one final test and we can move on to the next one because i think they will be quick (if you have time that is)

sha2: #207 entire thing won't compile with a trap, i bet it is something small silly (potentially on my end) and this one may actually make all of this worth while, improving performance on this

(not sure if you saw but... 100,000,000 hashes/sec on a RTX5090 is even better than I could come up with native https://github.com/brandonros/vanity_finder_cpp )

rand_xoshiro : #203

there's a chhhhhannnnce this ed25519 problem was related and it'll fix it but i doubt it, i also think it'll be something small

@brandonros
Copy link

I tested this with your vast script (thanks again!)

we could hypothetically make a manual github actions trigger with a Vast API token as a secret that can stand up a $0.15/hr instance, do CI against it (tests/examples/whatever) and then tear it down

I'd be willing to put some cycles into that if you'd like but I'm not sure if the benefit is there for you

AddressSpace(4)
if !self.codegen_args.use_constant_memory_space {
// We aren't using constant memory, so put the instance in global memory.
AddressSpace(1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we make consts somewhere that represent this 0 1 2 3 4 stuff better for easier readability?

Copy link
Contributor Author

@LegNeato LegNeato May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we have one in cuda_std and I didn't want to duplicate it. A followup should add a rustc_codegen_nvvm-types crate that cuda_std and rustc_codegen_nvvm could share (rust-gpu has a "-types" crate for just this reason). Irust-gpu also has rspirv for spirv-specific info encoded in rust types, so perhaps it should be something like rcuda? we could move the cuda error code mapping out of cust into it as well 🤔 )

Copy link

@brandonros brandonros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works

@LegNeato
Copy link
Contributor Author

I tested this with your vast script (thanks again!)

we could hypothetically make a manual github actions trigger with a Vast API token as a secret that can stand up a $0.15/hr instance, do CI against it (tests/examples/whatever) and then tear it down

I'd be willing to put some cycles into that if you'd like but I'm not sure if the benefit is there for you

We have GPUs sponsored by modal.com. I just haven't got a chance to get it all working (they didn't have simple ssh access, but looks like maybe they do now? #202

@brandonros
Copy link

brandonros commented May 26, 2025

I tested this with your vast script (thanks again!)

we could hypothetically make a manual github actions trigger with a Vast API token as a secret that can stand up a $0.15/hr instance, do CI against it (tests/examples/whatever) and then tear it down
I'd be willing to put some cycles into that if you'd like but I'm not sure if the benefit is there for you

We have GPUs sponsored by modal.com. I just haven't got a chance to get it all working (they didn't have simple ssh access, but looks like maybe they do now? #202

ok. not sure if you saw: https://github.com/brandonros/ed25519-vanity-rs/blob/master/.github/workflows/ci.yaml

we can build on runner (as you know and also have) and then run. i could help split the Vast script to not have build if needed

let's merge if you're ready, this is awesome. thanks again, hope you enjoyed working together so far. i'll retest the smaller simpler Xorshiro RNG issue after this, i have a feeling it might be fixed with this

@LegNeato
Copy link
Contributor Author

I tested this with your vast script (thanks again!)

we could hypothetically make a manual github actions trigger with a Vast API token as a secret that can stand up a $0.15/hr instance, do CI against it (tests/examples/whatever) and then tear it down
I'd be willing to put some cycles into that if you'd like but I'm not sure if the benefit is there for you

We have GPUs sponsored by modal.com. I just haven't got a chance to get it all working (they didn't have simple ssh access, but looks like maybe they do now? #202

ok. not sure if you saw: https://github.com/brandonros/ed25519-vanity-rs/blob/master/.github/workflows/ci.yaml

we can build on runner (as you know and also have) and then run. i could help split the Vast script to not have build if needed

let's merge if you're ready, this is awesome. thanks again, hope you enjoyed working together so far. i'll retest the smaller simpler Xorshiro RNG issue after this, i have a feeling it might be fixed with this

Yeah, we build containers as well:

https://github.com/Rust-GPU/Rust-CUDA/blob/main/.github/workflows/container_images.yml

That is the idea for the modal stuff...build on actions, push up to modal, and run.

@LegNeato LegNeato merged commit afb147e into Rust-GPU:main May 26, 2025
7 checks passed
@LegNeato
Copy link
Contributor Author

#218 for improving usage of constant memory space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

curve25519_dalek crate = runtime error
2 participants