-
Notifications
You must be signed in to change notification settings - Fork 2
Multi-Threading for Sample_z #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
(0..nr_threads) | ||
.into_par_iter() | ||
.map(|thread_i| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to provide another idea to make things quicker (although this might not be rayon
's way of parallelising things).
Currently, you split the workload into similar-sized buckets - implicitely making the assumption that each bucket will roughly take the same duration on each thread. This assumption should be correct in this case for larger bucket sizes, but for smaller sizes, it might be quicker to just submit tasks to a pool of threads, where each thread collects a new task once it has finished the current one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked again, but I was not able to find a better solution for it, and given that the function will probably not be called with small numbers of samples - I think the current implementation is reasonable.
The problem with the dynamic approach is that I was not able to find a good way to also distribute the integer sampler, and additionally, this also provides an overhead with more threadmanagement, which might also increase the runtime due to the dynamic distribution of tasks.
.reduce(Vec::new, |mut a, mut b| { | ||
a.append(&mut b); | ||
a | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the flamegraph, but considering my current experience with our library, setting the values of the matrix has a significant overhead. Furthermore, joining and iterating vectors shouldn't be the fastest thing in the world.
Could it be possible that it would be quicker to set the entry in the matrix directly after sampling it - sharing the matrix in an Arc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check later, but the join/reduce is probably efficient, as it moves the values.
I would assume that the additional management cost of Arc will exceed the runtime of the collect.
Co-authored-by: Jan Niklas Siemer <[email protected]>
Description
This PR implements
Comment: Rayon also has the option min_length, which might be more intuitive to use for splitting the amount of work for each rayon job. However, I was not able to come up with a good way to share the
dgis
over one rayon job with this change. Hence, I kept my implementation. If a reviewer finds a nice way, please suggest it.Improvements, measured on MAC
On WSL:
So, for very small matrices we have a small runtime loss, but for larger matrices we have a gain in efficiency.
Testing
Checklist: