Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DoubleEndedIterator and FusedIterator for ZipEq. #531

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

KarelPeeters
Copy link

@KarelPeeters KarelPeeters commented Feb 27, 2021

I wanted to use zip_eq(..).rev() and that required a DoubleEndedIterator implementation, so I added one. I checked the other "marker" traits and it was pretty easy to add a FusedIterator impl as well. The generic trait bounds are a bit too strong now, technically we don't need both of the inner iterators to be a FusedIterator, but I don't think we can do better in the current rust version.

I didn't create an issue for this since it's such a small change, I hope that's okay.

@KarelPeeters
Copy link
Author

I also noticed we're missing custom implementations of fold, try_fold, nth, advance_by etc, but I'm not sure if they are worth the added bloat. Let me know if you want me to send a PR for those as well.

@scottmcm
Copy link
Contributor

scottmcm commented Feb 27, 2021

I think there's an important semantic question about what zip_eq is supposed to mean.

Notably, with this change, going forward or backwards give different elements if the inputs have different lengths. And if iteration stops early for some reason -- any, take, etc -- then one might not notice the problem. For a concrete example, zip_eq(0..4, 0..8).all(|(a, b)| a == b) and zip_eq(0..4, 0..8).rev().all(|(a, b)| a == b) do different things with this PR.

Is that ok? I don't know. What should happen here? I don't know that either.

@scottmcm
Copy link
Contributor

As a thought: zip_eq says it panics for different lengths, but it doesn't.

Maybe it could make sense to do a check like this?

let (hint1, hint2) = (a.size_hint(), b.size_hint());
assert!(hint1.0 <= hint2.1.unwrap_or(usize::MAX) && hint2.0 <= hint1.1.unwrap_or(usize::MAX);

That would at least catch the easy cases early, without changing the trait requirements.

@jswrenn
Copy link
Member

jswrenn commented Feb 28, 2021

Notably, with this change, going forward or backwards give different elements if the inputs have different lengths.

I think it'd be fine if this PR resolved that issue by bounding the DoubleEndedIterator impl with an ExactSizeIterator bound. (The standard library's Zip does this too.)


As a thought: zip_eq says it panics for different lengths, but it doesn't.

Concretely:

#[cfg(test)]
use itertools::zip_eq;

#[test]
#[should_panic]
fn test() {
    let short = 0..1;
    let long  = 0..9;
    let iter = zip_eq(short, long); // no panic :(
}

That's neat. I was aware of the behavior, but it hadn't noticed how odd it was that the documentation implied that zip_eq, itself, panicked. This is weird, and totally needs to be sign-posted better at minimum.

Here's a thought:

  • define zip_eq with an ExactSizeIterator bound
  • define zip_lazy_eq without an ExactSizeIterator bound

@scottmcm
Copy link
Contributor

(The standard library's Zip does this too.)

That's not perfectly analogous, since different lengths are acceptable with std's zip, so it uses ESI to skip the extra elements in the longer one so that iterating backwards gets the correct pairs of elements.

So it would feel kinda weird to me to have the forward direction here panic lazily but the backward direction panic eagerly.

  • define zip_eq with an ExactSizeIterator bound
  • define zip_lazy_eq without an ExactSizeIterator bound

In a way, this is what my size_hint check is doing, just in the same method -- if you pass two ESIs, it'll panic immediately. But it'll also panic immediately for zip_eq(0..10, (0..8).filter(f)) even though the latter isn't ESI, because it can never be long enough to match the length of the former.

So it might be ok to just carefully document "⚠ This may do weird (but safe) things if you pass non-equal-length iterators ⚠" and treat it as "garbage in garbage out" in those cases.

@jswrenn
Copy link
Member

jswrenn commented Feb 28, 2021

So it would feel kinda weird to me to have the forward direction here panic lazily but the backward direction panic eagerly.

Ah, yeah.

In a way, this is what my size_hint check is doing, just in the same method -- if you pass two ESIs, it'll panic immediately. But it'll also panic immediately for zip_eq(0..10, (0..8).filter(f)) even though the latter isn't ESI, because it can never be long enough to match the length of the former.

I think I'm sold on this approach, but I wish I knew more about how people were using zip_eq in practice. Why reach for zip_eq instead of zip? Is the lazy panicking important? If not, it might make more sense to require ESI on zip_eq and panic eagerly. (Rayon's zip_eq does this.)

@scottmcm
Copy link
Contributor

I wish I knew more about how people were using zip_eq in practice

Yeah, would be nice to know whether things are ESI in practice.

👍 to following rayon. The more code can flip back and forth between serial and parallel the better.

I guess it could just change to ESI and early panic in a major version bump and see who complains 😆

@phimuemue
Copy link
Member

[...] with this change, going forward or backwards give different elements if the inputs have different lengths. [...]
Is that ok? I don't know. What should happen here? I don't know that either.

I once stumbled upon a similar issue, and PR'd DoubleEndedIterator (see rust-lang/rust#72626), which was accepted. (The documentation for it now explicitly states that the elements of DoubleEndedIterator may differ from the Iterator ones.)

Not sure, though, if that applies to panics, too.

@jswrenn
Copy link
Member

jswrenn commented Mar 1, 2021

Fascinating! Given that, I don't have any objections to merging this PR.

The issue of a panicking next()/next_back() seems orthogonal to the directionality concerns. I'm going to merge this PR, but I think we should revisit bounding zip_eq with ExactSizeIterator.

@jswrenn jswrenn added this to the next milestone Mar 1, 2021
@KarelPeeters
Copy link
Author

KarelPeeters commented Mar 1, 2021

I'm not so sure about this PR any more, some important issues were raised. I didn't even think of the different-size-different-elements problem, because it doesn't really matter for my use cases. That's mostly because of the fact that Iterator is a bit of an underspecified state machine, the only thing I can find in the docs about this issue is this sentence in the DoubleEndedIterator docs:

It is important to note that both back and forth work on the same range, and do not cross: iteration is over when they meet in the middle.

They work on the same range, does that also mean they have to work on the same values? Hard to tell... I'm pretty sure almost all of the code out there that accepts a DoubleEndedIterator as a parameter expects the iterator to return the same values in both directions, but I don't know how many such code there actually is.

Edit: I missed @phimuemue's comment, so technically it's fine.

To give some perspective, I'm kind of confused why the default version of zip is the one that silently ignores differently sized iterators. I grepped a few of my projects in Rust, Python and Kotlin and >90% of the zip use cases are actually zip_eq use cases, either with an assert that checks the sizes right above it or where it would probably be better to add that assert. In my experience most of the time you're iterating over two collections with the same size and you just want to add a cautionary assert. The only use case for normal zip I found is when you have an infinite iterator and a limited one.

In that light it doesn't really matter for me whether the assert happens at the start or the end, but of course the start is preferable. I really like the zip_eq vs zip_eq_lazy solution, although it's not backwards compatible. It may be worth it though, I didn't find many cases where the participating iterators are not ExactSizeIterator. Something like "keep the current zip_eq but add zip_eq_eager where only the latter requires ExactSizeIterator" is a backwards compatible alternative but unfortunately that has the wrong default.

It could be interesting to grep crates that use itertools::zip or itertools::zip_eq and see what they're using it for, I'll give that a shot tomorrow.

@KarelPeeters
Copy link
Author

Okay I looked around a bit on GitHub, this search query is the best I could come up with: zip_eq language:Rust -simultaneously (the -simultaneously filters out the file that declares zip_eq, because filtering by filename is currently broken in GitHub search).

It looks like most uses are with ExactSizeIterators, mostly from Vecs or slices. I only came across a few non-ESI use cases and one of them was with a custom iterator that could easily implement it. I think adding an ESI bound to zip_eq and introducing a new zip_eq_lazy would not break a lot of dependants.

Another interesting datapoint is that rayon also has a zip_eq function, it's only implemented for IndexedParallelIterator which includes the requirements for ESI. It panics immediatly when zip_eq is called.

@aobatact aobatact mentioned this pull request May 26, 2021
bors bot added a commit that referenced this pull request Jun 8, 2021
550: Add More FusedIterator r=jswrenn a=aobatact

These Iterator is fused if the underlying Iterator is fused.
-  `FilterOk`
-  `FilterMapOk`
- `InterleaveShortest`
- `KMergeBy`
- `MergeBy`
- `PadUsing`
- `Positions`
- `Product` 
- `RcIter`
- `TupleWindows`
- `Unique`
- `UniqueBy`
-  `Update`
- `WhileSome`

These is fused even though the underlying Iterator is not fused.
- `Combinations` 
- `CombinationsWithReplacement`
- `Powerset`
- `RepeatN`
- `WithPosition` 

`FusedIterator` can be added to these structs.

Related  #55, #152, #531, #533

I separate the pull request with #548 because these Iterator are sure to be fused because it was documented, but I'm not 100% sure that the structs in this PR is actually fused. (Though I believe it is.)

Co-authored-by: aobatact <[email protected]>
@jswrenn jswrenn removed this from the next milestone Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants