-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial working example of lending iterator for combinations. #682
base: master
Are you sure you want to change the base?
Initial working example of lending iterator for combinations. #682
Conversation
Added benchmarks comparing `combinations` and `combinations_lending`. They show the lending implementation is ~1x-7x speed. Feature works either on or off for conditional compilation.
f8b22e7
to
0eefe16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there, thanks for the effort you put into this.
Here's my two cents (personal - so no final word):
- I saw that the PR contains quite some formatting/documentation changes. You can surely include them, but it will simplify the final reviewer's life when they're in a separate commit.
- I am always wary when it comes to including another crate - even if it is behind a feature flag, it is code to properly maintain and test.
- If you are looking for maximum performance, I am not sure whether the "manipulate-indices and collect indexed items"-approach is the best. It may be, but it may also be faster to manipulate items (or references to items) instead. (E.g. if you really just want combinations from the
Vec
containing0,1,2,3
you could simply work on the numbers directly.) Did you happen to research that? Does anybody reading this have relevant experience?
As per the above, merging this PR might be a good idea, but before that we should really find out what we want to offer. Right now the raison d'être of combinations
seems to be convenience (at the price of some performance penalty). If we want to have its efficient counterpart, I would strive to make it as efficient as possible (possibly even sacrificing some convenience). I would vote against having something in between.
} | ||
} | ||
|
||
#[gat] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never seen this attribute. Can you shed light on what it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It enables usage of generic associated types in stable rust.
It is defined here in nougat and used here in lending_iterator. I believe it has to do with higher kinded types
It is what makes it possible to depend on the 'next
lifetime in the following code. (I don't know how to link the location so here is a copy-paste)
I: Iterator,
I::Item: Clone,
{
type Item<'next>
where
Self: 'next,
= Combination<'next, I>;
src/combinations.rs
Outdated
fn next(&mut self) -> Option<Combination<I>> { | ||
if self.first { | ||
if self.k() > self.n() { | ||
return None; | ||
} | ||
self.first = false; | ||
} else if self.indices.is_empty() { | ||
return None; | ||
} else { | ||
// Scan from the end, looking for an index to increment | ||
let mut i: usize = self.indices.len() - 1; | ||
|
||
// Check if we need to consume more from the iterator | ||
if self.indices[i] == self.pool.len() - 1 { | ||
self.pool.get_next(); // may change pool size | ||
} | ||
|
||
while self.indices[i] == i + self.pool.len() - self.indices.len() { | ||
if i > 0 { | ||
i -= 1; | ||
} else { | ||
// Reached the last combination | ||
return None; | ||
} | ||
} | ||
|
||
// Increment index, and reset the ones to its right | ||
self.indices[i] += 1; | ||
for j in i + 1..self.indices.len() { | ||
self.indices[j] = self.indices[j - 1] + 1; | ||
} | ||
} | ||
|
||
// Create result vector based on the indices | ||
// let out: () = Some(self.indices.iter().map(|i| self.pool[*i].clone())); | ||
Some(Combination { | ||
combinations: &*self, | ||
index: 0, | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this logic is non-trivial, it should possibly be shared with the existing combinations iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 7839cd3
I'm not sure I understand this point. Are you saying I should have separated out the comments and documentation additions to the new code from the new code being commented on?
That is the why there are automated tests, no? If in a worst case situation all the code relying on the dependency broke the remaining features would still work. It seems like there is not cost beyond the bug risk inherent in any code written.
I also do not know if the implementation is the best. I did not change the logic of how combinations are computed. Though it seems worth noting that the indexed items are not being collected as a group in the
My thoughts on this are:
The below code was a check I used to confirm that .iter() will result in clones of references. Notice how the function that requires a cloneable item receives a reference to a non-clonable struct and compiles fine. fn need_cloneable<T: Clone>(y: T) -> T {
y.clone()
}
#[derive(Debug)]
struct NotCloneableStruct(usize, Vec<usize>);
fn main() {
let c = vec![
NotCloneableStruct(1, vec![1, 1]),
NotCloneableStruct(2, vec![2, 2]),
NotCloneableStruct(3, vec![3, 3]),
NotCloneableStruct(4, vec![4, 4]),
];
c.iter()
.map(|x: &NotCloneableStruct| {
dbg!(&x);
dbg!(x.clone());
dbg!(x);
dbg!(need_cloneable(x)); // All references implement Clone because of core's impl<T> Clone for &T.
})
.for_each(drop);
}
If I saw a way to further sacrifice convenience for performance I would have done so. I don't see how this would be done. |
Combinations lending and non-lending use same function now.
No, I'm talking about things as https://github.com/rust-itertools/itertools/pull/682/files#diff-01a3ceaed8205eb8f50a269c77b7888890da04e982c9c793943743a1ee4471c3L18-R27 or https://github.com/rust-itertools/itertools/pull/682/files#diff-01a3ceaed8205eb8f50a269c77b7888890da04e982c9c793943743a1ee4471c3L48-R62 or https://github.com/rust-itertools/itertools/pull/682/files#diff-01a3ceaed8205eb8f50a269c77b7888890da04e982c9c793943743a1ee4471c3L53-R108.
I saw that you do not
With indices you'll have to look up the item from the buffer. That is, you essentially have to add the index to the begin-pointer of the buffer. Handling references directly could skip this addition, thus possibly save one indirection. (As always, compilers may be clever enough to optimize some things. And - on top of that - I did not benchmark; it just came to my mind.)
I could imagine something like |
tests/test_std.rs
Outdated
#[test] | ||
#[cfg(feature = "lending_iters")] | ||
fn combinations_lending_feature_parity_to_non_lending() { | ||
assert!((1..3).combinations_lending(5).next().is_none()); | ||
|
||
let it = (1..3).combinations_lending(2).map_into_iter(|x| x.collect_vec()).collect::<Vec<_>>(); | ||
it::assert_equal(it, vec![ | ||
vec![1, 2], | ||
]); | ||
|
||
let mut out = Vec::new(); | ||
for i in (1..3).combinations_lending(2) { | ||
out.push(i); | ||
} | ||
it::assert_equal(out, vec![vec![1, 2]]); | ||
|
||
let it = (1..5).combinations_lending(2).collect_nested_vec(); | ||
it::assert_equal(it, vec![ | ||
vec![1, 2], | ||
vec![1, 3], | ||
vec![1, 4], | ||
vec![2, 3], | ||
vec![2, 4], | ||
vec![3, 4], | ||
]); | ||
|
||
it::assert_equal((0..0).combinations_lending(2).map_into_iter(|x| x.collect_vec()).collect_vec(), <Vec<Vec<_>>>::new()); | ||
|
||
it::assert_equal((0..1).combinations_lending(1).collect_nested_vec(), vec![vec![0]]); | ||
it::assert_equal((0..2).combinations_lending(1).collect_nested_vec(), vec![vec![0], vec![1]]); | ||
it::assert_equal((0..2).combinations_lending(2).collect_nested_vec(), vec![vec![0, 1]]); | ||
|
||
for i in 1..10 { | ||
assert!((0..0).combinations_lending(i).next().is_none()); | ||
assert!((0..i - 1).combinations_lending(i).next().is_none()); | ||
|
||
} | ||
|
||
it::assert_equal((1..3).combinations_lending(0), vec![vec![]]); | ||
it::assert_equal((0..0).combinations_lending(0), vec![vec![]]); | ||
|
||
it::assert_equal((1..3).combinations(0), (1..3).combinations_lending(0)); | ||
it::assert_equal((0..0).combinations(0), (0..0).combinations_lending(0)); | ||
assert_eq!((1..3).combinations(0).collect_vec(), (1..3).combinations_lending(0).collect_nested_vec()); // Should match exactly including type. | ||
it::assert_equal((0..0).combinations(0), (0..0).combinations_lending(0)); | ||
} | ||
|
||
// Below shouldn't compile because of attempt to reference an already mut reference. | ||
// #[test] | ||
// fn combinations_lending_cant_double_mut() { | ||
// let mut out = (1..4).combinations_lending(2); | ||
// let mut combination = out.next().unwrap(); | ||
// let combination2 = out.next().unwrap(); | ||
// combination.next(); | ||
// } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use property based testing heavily in itertools. Here, the property that you want to check is that alike invocations of combinations
and combinations_lending
will produce alike outputs. Could you formulate this as a quickcheck
property based test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in ce1e22a.
@@ -27,6 +27,7 @@ test = false | |||
|
|||
[dependencies] | |||
either = { version = "1.0", default-features = false } | |||
lending-iterator = { version = "0.1.6", optional = true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielhenrymantilla This feature relies heavily on your lending-iterator
crate. Do you think this would be appropriate functionality to lending-iterator
? I think the performance story presented by the PR is compelling enough that this functionality should be somewhere, but I'm not yet sure where.
I share @phimuemue's hesitation for adding a new public dependency to itertools. The bar for adding dependencies, especially public ones, is very high. Itertools only has one dependency (either
), and it's maintained by a maintainer of itertools
First, we need to be sure that new dependencies will not pose a risk of breakage for our users. At minimum, new dependencies must adhere to the itertools MSRV policy. As a foundational crate in the Rust ecosystem, we treat MSRV increases as breaking changes; our dependencies should do so too. We also maintain a very conservative MSRV of 1.36
; adding lending-iterator
as a dependency would require increasing that to 1.57
.
Second, we need to be confident that new dependencies (and features, really) will not increase our maintenance burden too much. If we accept this PR, I think it would be reasonable to expect that we will eventually provide lending alternatives of many of our other iterator adapters. I don't know about @phimuemue, but I don't have any experience with lending-iterator
. We'll need to consider whether this is a road we want itertools to go down.
I think this needs a strong reason that these adapters shouldn't just be in the lending-iterator crate directly. My instinct is that |
@scottmcm @jswrenn The reason I though to implement this functionality here is that the actual logic of the lending iterators is almost identical. The new feature uses the same logic and the state is held in the same struct. It would be a shame if this would have to be implemented in @danielhenrymantilla 's Additionally the It seems unlikely to me that the best solution to this issue would be to have two of every iterator in two different crates with the exact same struct and nearly identical methods, but I don't see how else to implement this if it is merged in a different crate. |
Oh I understand now. That looks like rustfmt. I forgot I had it set to auto format on save. I will endeavor to format in a separate commit for the future. Thank you.
How would one handle references directly? Wouldn't you need to store in an array/buffer/vec the references that were being manipulated anyway? I could very well be misunderstanding but it seems like the only difference between what you describe and what is implemented is that the current implementation will handle references if the iterator is over references and actual values if the iterator is over values. Rather than always references of what is being iterated over. Is the current implementation not preferable?
The c++ implementation for permutations can use the fact that they define the ordering to be the "set of all permutations is ordered lexicographically with respect to operator<" to determine the next permutation from the current. What would the definition of the combinations ordering be? Also, extrapolating what the next combination should be by looking through the current combination seems like it would be slower than keeping track of which combination is next in the iterator's state as is currently the case. |
I was working on another project and realized that 30% of the runtime was the itertools
.combinations()
adapter allocating vectors on the heap.Added an implementation for a lending iterator that is able to avoid heap allocations with a lending iterator and completely eliminated the heap allocation speed penalties. It called the adapter
combinations_lending
. I also added an optional feature that can be enabled to utilize this and maybe in the future other lending iterators.Added benchmarks comparing
.combinations()
and.combinations_lending()
. They show the lending implementation is ~1x-7x speed depending on use case. I did not measure a noticeable speed penalty in the benchmarks so a future change could theoretically implement the normal.combinations()
iterator in terms of the lending iterator's.into_iter()
method.Feature allows for conditional compilation so those who don't want a lending iterator don't need to compile the related dependencies.
More info in the source code documentation.
Let me know what you think!