-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added by_blocks method #831
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
1a49db3
added by_blocks method
wagnerf42 2527cc3
added specific blocks sizes methods
ff4056c
tests for blocks operations
28743b3
new public types for blocks
wagnerf42 25f03e2
blocks: renamed doubling -> exponential
a64aa0c
find benchmark with blocks
93c2ca6
blocks: simplify the iterator types
cuviper be3a797
blocks: create the callback directly
cuviper 2290ba0
blocks: minor doc tweaks
cuviper 2cccfba
blocks: assert block_size != 0
cuviper 4d2f0b6
blocks: extract the exponential size fn
cuviper File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
use super::plumbing::*; | ||
use super::*; | ||
|
||
struct BlocksCallback<S, C> { | ||
sizes: S, | ||
consumer: C, | ||
len: usize, | ||
} | ||
|
||
impl<T, S, C> ProducerCallback<T> for BlocksCallback<S, C> | ||
where | ||
C: UnindexedConsumer<T>, | ||
S: Iterator<Item = usize>, | ||
{ | ||
type Output = C::Result; | ||
|
||
fn callback<P: Producer<Item = T>>(mut self, mut producer: P) -> Self::Output { | ||
let mut remaining_len = self.len; | ||
let mut consumer = self.consumer; | ||
|
||
// we need a local variable for the accumulated results | ||
// we call the reducer's identity by splitting at 0 | ||
let (left_consumer, right_consumer, _) = consumer.split_at(0); | ||
let mut leftmost_res = left_consumer.into_folder().complete(); | ||
consumer = right_consumer; | ||
|
||
// now we loop on each block size | ||
while remaining_len > 0 && !consumer.full() { | ||
// we compute the next block's size | ||
let size = self.sizes.next().unwrap_or(std::usize::MAX); | ||
let capped_size = remaining_len.min(size); | ||
remaining_len -= capped_size; | ||
|
||
// split the producer | ||
let (left_producer, right_producer) = producer.split_at(capped_size); | ||
producer = right_producer; | ||
|
||
// split the consumer | ||
let (left_consumer, right_consumer, _) = consumer.split_at(capped_size); | ||
consumer = right_consumer; | ||
|
||
leftmost_res = consumer.to_reducer().reduce( | ||
leftmost_res, | ||
bridge_producer_consumer(capped_size, left_producer, left_consumer), | ||
); | ||
} | ||
leftmost_res | ||
} | ||
} | ||
|
||
/// `ExponentialBlocks` is a parallel iterator that consumes itself as a sequence | ||
/// of parallel blocks of increasing sizes (exponentially). | ||
/// | ||
/// This struct is created by the [`by_exponential_blocks()`] method on [`IndexedParallelIterator`] | ||
/// | ||
/// [`by_exponential_blocks()`]: trait.IndexedParallelIterator.html#method.by_exponential_blocks | ||
/// [`IndexedParallelIterator`]: trait.IndexedParallelIterator.html | ||
#[must_use = "iterator adaptors are lazy and do nothing unless consumed"] | ||
#[derive(Debug, Clone)] | ||
pub struct ExponentialBlocks<I> { | ||
base: I, | ||
} | ||
|
||
impl<I> ExponentialBlocks<I> { | ||
pub(super) fn new(base: I) -> Self { | ||
Self { base } | ||
} | ||
} | ||
|
||
impl<I> ParallelIterator for ExponentialBlocks<I> | ||
where | ||
I: IndexedParallelIterator, | ||
{ | ||
type Item = I::Item; | ||
|
||
fn drive_unindexed<C>(self, consumer: C) -> C::Result | ||
where | ||
C: UnindexedConsumer<Self::Item>, | ||
{ | ||
let first = crate::current_num_threads(); | ||
let callback = BlocksCallback { | ||
consumer, | ||
sizes: std::iter::successors(Some(first), exponential_size), | ||
len: self.base.len(), | ||
}; | ||
self.base.with_producer(callback) | ||
} | ||
} | ||
|
||
fn exponential_size(size: &usize) -> Option<usize> { | ||
Some(size.saturating_mul(2)) | ||
} | ||
|
||
/// `UniformBlocks` is a parallel iterator that consumes itself as a sequence | ||
/// of parallel blocks of constant sizes. | ||
/// | ||
/// This struct is created by the [`by_uniform_blocks()`] method on [`IndexedParallelIterator`] | ||
/// | ||
/// [`by_uniform_blocks()`]: trait.IndexedParallelIterator.html#method.by_uniform_blocks | ||
/// [`IndexedParallelIterator`]: trait.IndexedParallelIterator.html | ||
#[must_use = "iterator adaptors are lazy and do nothing unless consumed"] | ||
#[derive(Debug, Clone)] | ||
pub struct UniformBlocks<I> { | ||
base: I, | ||
block_size: usize, | ||
} | ||
|
||
impl<I> UniformBlocks<I> { | ||
pub(super) fn new(base: I, block_size: usize) -> Self { | ||
Self { base, block_size } | ||
} | ||
} | ||
|
||
impl<I> ParallelIterator for UniformBlocks<I> | ||
where | ||
I: IndexedParallelIterator, | ||
{ | ||
type Item = I::Item; | ||
|
||
fn drive_unindexed<C>(self, consumer: C) -> C::Result | ||
where | ||
C: UnindexedConsumer<Self::Item>, | ||
{ | ||
let callback = BlocksCallback { | ||
consumer, | ||
sizes: std::iter::repeat(self.block_size), | ||
len: self.base.len(), | ||
}; | ||
self.base.with_producer(callback) | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a design choice here when
sizes.next()
returnsNone
, with at least three options:Skip
does this with its skipped prefix to match the semantics ofstd::iter::Skip
.Take
.Did you consider this? I think behavior like
Take
might actually be a nice choice.Another question is whether
size == 0
should be allowed. At the very least, it would be a wasted split here, which could just be considered the user's poor choice. Or we could make that impossible by havingS
produceNonZeroUsize
items, but that makes it more annoying to write the iterator.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It seems quite surprising to me to ignore some of the elements. I would definitely not expect that. Ultimately, I think we want to offer users a choice of how to manage this -- another behavior I could imagine would be "go on using the last block size that got returned until we are complete".
I am wondering if we can express these semantics with some composed operators? I'd sort of like to say something like
.by_blocks().skip_remainder()
or something. We could just add inherent methods to the return type to allow users to change back and forth between those modes, no?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that the user can express most of those semantics in the iterator itself:
my_iter.chain(iter::once(usize::MAX))
my_iter.chain(iter::repeat(last))
Take
behavior: justmy_iter
, and let itsNone
be the endYeah, that's possible too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another might be asserting that there is no remainder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at minimum we should document these techniques, and i might prefer a convenient way to express them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we only allow constant sizes and doubling sizes in the public interface then this choice about what to do when the size iterator ends is of private concern.
i used to have both versions, one like an implicit take when it runs out and the other one forcing to consume everything. so there is indeed something to choose here.
i would favor consuming it all since I happened to get some bugs missing some elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, i have some questions like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be in favor of only allowing the two options until we know there is demand for something more general. As for unit tests, we typically just add
#[test]
in the modules, I think? Alternatively, you can add tests into thetests
directory but we don't usually do that. (Is this right, @cuviper?)For benchmarks, we generally modify the rayon-bench project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For tests, there are a few generic sanity checks to add:
tests/clones.rs
,tests/debug.rs
, andsrc/compile_fail/must_use.rs
. Beyond that, adding unit#[test]
directly in the module is fine, or break into atests
submodule if there are a lot. Doc-test examples are also nice to serve both doc and testing roles.