Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zip_clones, zips an iterator with clones of a value #989

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

barakugav
Copy link

Similar to `iter.zip(repeat_n(val, n))' but does not require knowing n

@phimuemue
Copy link
Member

Hi there, foremost: Thanks for this.

Honestly, I don't believe that is sufficiently useful to justify an own implementation. Is the intention to avoid the clone call for the last element? My answer then would be: If cloneing is really that expensive, why not always work with the borrowed value? Is the one clone call decisive for run-time to justify the existence of this iterator? And: Why is avoiding the clones is only relevant in zip-contexts?

If my assumptions are correct, I personally would advise against inclusion of this.

Aside: Couldn't the unsafe code be avoided if the struct held a Option<(Iter::Item, Zipped)> instead of two Options?

@barakugav
Copy link
Author

barakugav commented Sep 2, 2024

I think it is useful in many use cases, take for example mine:
I have a multi threaded system with threads communicating using crossbeam channels. When a thread wants to send some non-Copy data to the others it does so by value, therefore moving, so it must clone the value to be able to send it to multiple threads (unknown number). The data packets are few hundreds bytes, but the frequency of sending is high. In addition, many times a thread is connected to a single other thread only, in which case no clone is necessary in the first place. All in all, its ugly to avoid cloning without such iterator, Arc is not ideal because it introduce another pointer indirection and allocation on the heap.
The code becomes much more elegant when using such iterator.

Avoiding the clone is not relevant only in zip contexts, but its the smoother way to clone something n-1 times when you already have a iterator that iterate n times, even if you dont know n.

Thanks for the comment regarding unsafe, ill happily fix it if you are willing to go forward with the change :)
@phimuemue

@phimuemue
Copy link
Member

Hm... I hope I don't miss the forest for the trees, but I am not convinced.

Maybe @jswrenn or @Philippe-Cholet groks your examples. Could you share a boiled down version of your use case to show why zip_clones is so much better than the alternatives?

@barakugav
Copy link
Author

barakugav commented Sep 3, 2024

Here is a boiled down version of my use case:

#[derive(Clone)]
struct Data {
    values: Vec<f64>,
    metadata_field1: usize,
    metadata_field2: usize,
    // ...
}

struct Component {
    inputs: Vec<Receiver<Data>>,
    outputs: Vec<Sender<Data>>,
}

impl Component {
    fn component_main(&self) {
        let mut sel = Select::new();
        for input in self.inputs.iter() {
            sel.recv(input);
        }

        loop {
            let oper = sel.select();
            let idx = oper.index();
            let data = oper.recv(&self.inputs[idx]).unwrap();

            let result = self.process_data(data);

            for output in self.outputs.iter() {
                output.send(result.clone()).unwrap();
            }
        }
    }

    fn process_data(&self, data: Data) -> Data {
        unimplemented!()
    }
}

Each Component::component_main is run by a different thread.
The components are connected using crossbeam channels, with arbitrary number of inputs/outputs.
Each component waits for input data, compute something, and pass a result data to its outputs.
The Data structs is non-Copy, few hundreds bytes, but is usually sent in high frequencies between the components.

The above implementation use a trivial approach cloning the result for each output.

for output in self.outputs.iter() {
    output.send(result.clone()).unwrap();
}

Given a zip_clones method, the implementation stays clean, but avoid the last clone.

for (output, result) in self.outputs.iter().zip_clones(result) {
    output.send(result).unwrap();
}

Without such a method, you could implement it as follows:

if !self.outputs.is_empty() {
    for output in self.outputs[..self.outputs.len() - 1].iter() {
        output.send(result.clone()).unwrap();
    }
    self.outputs[self.outputs.len() - 1].send(result).unwrap();
}

The above relay on the fact the outputs are stored in a vector, as we must know the number of outputs.
If the only information we have about the outputs is that we can iterate over them, the implementation becomes more messy.

let mut outputs = self.outputs.iter();
let mut cur = outputs.next();
if cur.is_some() {
    let mut next = outputs.next();
    while next.is_some() {
        cur.unwrap().send(result.clone()).unwrap();
        cur = next;
        next = outputs.next();
    }
    cur.unwrap().send(result).unwrap();
}

To emphasise, I want to avoid the last clone because many times a component is connected to only one other component, namely it has a single output, and no allocation should be done at all. Even for cases in which I have few outputs, the data packets are flowing in high frequencies and the last clone is redundant.
Using an Arc can solve the problem, but many components actually do want to consume the packet, either to modify it or steal its internal, so they would clone the inner of Arc anyway. Also it introduce another heap allocation and indirection, which I would love to avoid.

@barakugav
Copy link
Author

@phimuemue @jswrenn @Philippe-Cholet what do you think?

@phimuemue
Copy link
Member

Hi there, my opinion has not changed.

I retract from this PR to make way for @jswrenn or @Philippe-Cholet if they want to take over.

@jswrenn
Copy link
Member

jswrenn commented Sep 22, 2024

I've run into the "want to avoid an unnecessary clone" problem with iterators in the past, and I'm happy to consider solutions to the problem. Things are a little chaotic at work right now, but I'll give this a thorough review as soon as I'm able.

src/zip_clones.rs Outdated Show resolved Hide resolved
src/zip_clones.rs Outdated Show resolved Hide resolved
Comment on lines 32 to 33
let mut iter = i.into_iter();
let next = iter.next();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterator adapters are, ideally, lazy — i.e., they don't advance the underlying iterator when they're merely created.

Is there any way to avoid this peeking behavior altogether?

Copy link
Author

@barakugav barakugav Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I dont think so, because you need to know if the current call to next() should clone the zipped value or consume it, so you must peek.
We can maybe avoid the peek during the creation of the iterator, but not the peeks during next() calls. tell me if you think it has any value.

src/zip_clones.rs Outdated Show resolved Hide resolved
Similar to `iter.zip(repeat_n(val, n))' but does not require knowing n
@barakugav
Copy link
Author

@jswrenn patched the PR

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.48%. Comparing base (6814180) to head (5a8a472).
Report is 128 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #989      +/-   ##
==========================================
+ Coverage   94.38%   94.48%   +0.09%     
==========================================
  Files          48       50       +2     
  Lines        6665     6818     +153     
==========================================
+ Hits         6291     6442     +151     
- Misses        374      376       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants