-
Notifications
You must be signed in to change notification settings - Fork 186
Make constant memory opt-in, spill large statics to global memory #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This looks like it was code that wasn't deleted after the refactor in Rust-GPU@decda87
This isn't fully correct, as ideally we keep track of what we have put into constant memory and when it is filled up spill instdead of only spilling when a static is big. But, this is materially better than what is there (which is a runtime error). An argument can be made to just _always_ use global memory and we don't have to worry about getting the packing right. Fixes Rust-GPU#208. See also the debugging and discussion in Rust-GPU#216
I've decided to default to not using constant memory with an opt-in flag. Later when we are smarter we can flip the flag by default and/or make it a no-op. I tested this with your vast script (thanks again!) and it should work...I was playing around with putting everything in globals previously and that must have snuck in when testing the previous commits! |
let me give it one final test and we can move on to the next one because i think they will be quick (if you have time that is) sha2: #207 entire thing won't compile with a trap, i bet it is something small silly (potentially on my end) and this one may actually make all of this worth while, improving performance on this (not sure if you saw but... 100,000,000 hashes/sec on a RTX5090 is even better than I could come up with native https://github.com/brandonros/vanity_finder_cpp ) rand_xoshiro : #203 there's a chhhhhannnnce this ed25519 problem was related and it'll fix it but i doubt it, i also think it'll be something small |
we could hypothetically make a manual github actions trigger with a Vast API token as a secret that can stand up a $0.15/hr instance, do CI against it (tests/examples/whatever) and then tear it down I'd be willing to put some cycles into that if you'd like but I'm not sure if the benefit is there for you |
AddressSpace(4) | ||
if !self.codegen_args.use_constant_memory_space { | ||
// We aren't using constant memory, so put the instance in global memory. | ||
AddressSpace(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we make consts somewhere that represent this 0 1 2 3 4 stuff better for easier readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we have one in cuda_std
and I didn't want to duplicate it. A followup should add a rustc_codegen_nvvm-types
crate that cuda_std
and rustc_codegen_nvvm
could share (rust-gpu
has a "-types" crate for just this reason). Irust-gpu
also has rspirv
for spirv-specific info encoded in rust types, so perhaps it should be something like rcuda
? we could move the cuda error code mapping out of cust
into it as well 🤔 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it works
We have GPUs sponsored by modal.com. I just haven't got a chance to get it all working (they didn't have simple ssh access, but looks like maybe they do now? #202 |
ok. not sure if you saw: https://github.com/brandonros/ed25519-vanity-rs/blob/master/.github/workflows/ci.yaml we can build on runner (as you know and also have) and then run. i could help split the Vast script to not have build if needed let's merge if you're ready, this is awesome. thanks again, hope you enjoyed working together so far. i'll retest the smaller simpler Xorshiro RNG issue after this, i have a feeling it might be fixed with this |
Yeah, we build containers as well: https://github.com/Rust-GPU/Rust-CUDA/blob/main/.github/workflows/container_images.yml That is the idea for the modal stuff...build on actions, push up to modal, and run. |
#218 for improving usage of constant memory space. |
I've decided to default to not using constant memory with an opt-in flag. Later when we are smarter we can flip the flag by default and/or make it a no-op. One can still annotate code with
#[cuda_std::address_space(constant)]
to place it manually., this flag only affects automatic placement by the codegen backend.Using this flag / turning on constant memory can blow up as constant memory placing logic isn't fully correct. Ideally we keep track of what we have put into constant memory and when it is filled up spill instead of only spilling when a static is too large on its own. We'll also probably want some packing strategy controlled by the user...for example, if you have one large static and many small ones, you might want the small ones to all be in constant memory or just the big one depending on your workload. We need some design work around this, and the design shouldn't require code to be annotated to support third party non-GPU-aware libraries.
But, this is materially better than what is there (which is a runtime error).
Fixes #208.
See also the debugging and discussion in
#216