Add `zip_clones`, zips an iterator with clones of a value #989

barakugav · 2024-09-01T09:44:39Z

Similar to `iter.zip(repeat_n(val, n))' but does not require knowing n

phimuemue · 2024-09-01T18:28:58Z

Hi there, foremost: Thanks for this.

Honestly, I don't believe that is sufficiently useful to justify an own implementation. Is the intention to avoid the clone call for the last element? My answer then would be: If cloneing is really that expensive, why not always work with the borrowed value? Is the one clone call decisive for run-time to justify the existence of this iterator? And: Why is avoiding the clones is only relevant in zip-contexts?

If my assumptions are correct, I personally would advise against inclusion of this.

Aside: Couldn't the unsafe code be avoided if the struct held a Option<(Iter::Item, Zipped)> instead of two Options?

barakugav · 2024-09-02T06:09:12Z

I think it is useful in many use cases, take for example mine:
I have a multi threaded system with threads communicating using crossbeam channels. When a thread wants to send some non-Copy data to the others it does so by value, therefore moving, so it must clone the value to be able to send it to multiple threads (unknown number). The data packets are few hundreds bytes, but the frequency of sending is high. In addition, many times a thread is connected to a single other thread only, in which case no clone is necessary in the first place. All in all, its ugly to avoid cloning without such iterator, Arc is not ideal because it introduce another pointer indirection and allocation on the heap.
The code becomes much more elegant when using such iterator.

Avoiding the clone is not relevant only in zip contexts, but its the smoother way to clone something n-1 times when you already have a iterator that iterate n times, even if you dont know n.

Thanks for the comment regarding unsafe, ill happily fix it if you are willing to go forward with the change :)
@phimuemue

phimuemue · 2024-09-02T18:22:27Z

Hm... I hope I don't miss the forest for the trees, but I am not convinced.

Maybe @jswrenn or @Philippe-Cholet groks your examples. Could you share a boiled down version of your use case to show why zip_clones is so much better than the alternatives?

barakugav · 2024-09-03T07:15:51Z

Here is a boiled down version of my use case:

#[derive(Clone)]
struct Data {
    values: Vec<f64>,
    metadata_field1: usize,
    metadata_field2: usize,
    // ...
}

struct Component {
    inputs: Vec<Receiver<Data>>,
    outputs: Vec<Sender<Data>>,
}

impl Component {
    fn component_main(&self) {
        let mut sel = Select::new();
        for input in self.inputs.iter() {
            sel.recv(input);
        }

        loop {
            let oper = sel.select();
            let idx = oper.index();
            let data = oper.recv(&self.inputs[idx]).unwrap();

            let result = self.process_data(data);

            for output in self.outputs.iter() {
                output.send(result.clone()).unwrap();
            }
        }
    }

    fn process_data(&self, data: Data) -> Data {
        unimplemented!()
    }
}

Each Component::component_main is run by a different thread.
The components are connected using crossbeam channels, with arbitrary number of inputs/outputs.
Each component waits for input data, compute something, and pass a result data to its outputs.
The Data structs is non-Copy, few hundreds bytes, but is usually sent in high frequencies between the components.

The above implementation use a trivial approach cloning the result for each output.

for output in self.outputs.iter() {
    output.send(result.clone()).unwrap();
}

Given a zip_clones method, the implementation stays clean, but avoid the last clone.

for (output, result) in self.outputs.iter().zip_clones(result) {
    output.send(result).unwrap();
}

Without such a method, you could implement it as follows:

if !self.outputs.is_empty() {
    for output in self.outputs[..self.outputs.len() - 1].iter() {
        output.send(result.clone()).unwrap();
    }
    self.outputs[self.outputs.len() - 1].send(result).unwrap();
}

The above relay on the fact the outputs are stored in a vector, as we must know the number of outputs.
If the only information we have about the outputs is that we can iterate over them, the implementation becomes more messy.

let mut outputs = self.outputs.iter();
let mut cur = outputs.next();
if cur.is_some() {
    let mut next = outputs.next();
    while next.is_some() {
        cur.unwrap().send(result.clone()).unwrap();
        cur = next;
        next = outputs.next();
    }
    cur.unwrap().send(result).unwrap();
}

To emphasise, I want to avoid the last clone because many times a component is connected to only one other component, namely it has a single output, and no allocation should be done at all. Even for cases in which I have few outputs, the data packets are flowing in high frequencies and the last clone is redundant.
Using an Arc can solve the problem, but many components actually do want to consume the packet, either to modify it or steal its internal, so they would clone the inner of Arc anyway. Also it introduce another heap allocation and indirection, which I would love to avoid.

barakugav · 2024-09-22T12:43:17Z

@phimuemue @jswrenn @Philippe-Cholet what do you think?

phimuemue · 2024-09-22T17:35:52Z

Hi there, my opinion has not changed.

I retract from this PR to make way for @jswrenn or @Philippe-Cholet if they want to take over.

jswrenn · 2024-09-22T17:38:29Z

I've run into the "want to avoid an unnecessary clone" problem with iterators in the past, and I'm happy to consider solutions to the problem. Things are a little chaotic at work right now, but I'll give this a thorough review as soon as I'm able.

src/zip_clones.rs

jswrenn · 2024-10-22T13:10:31Z

src/zip_clones.rs

+    let mut iter = i.into_iter();
+    let next = iter.next();


Iterator adapters are, ideally, lazy — i.e., they don't advance the underlying iterator when they're merely created.

Is there any way to avoid this peeking behavior altogether?

In general, I dont think so, because you need to know if the current call to next() should clone the zipped value or consume it, so you must peek.
We can maybe avoid the peek during the creation of the iterator, but not the peeks during next() calls. tell me if you think it has any value.

src/zip_clones.rs

Similar to `iter.zip(repeat_n(val, n))' but does not require knowing n

barakugav · 2024-11-01T12:19:37Z

@jswrenn patched the PR

codecov · 2024-11-14T14:31:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.48%. Comparing base (6814180) to head (5a8a472).
Report is 128 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #989      +/-   ##
==========================================
+ Coverage   94.38%   94.48%   +0.09%     
==========================================
  Files          48       50       +2     
  Lines        6665     6818     +153     
==========================================
+ Hits         6291     6442     +151     
- Misses        374      376       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

barakugav · 2025-01-01T12:49:03Z

@jswrenn any more comments?

jswrenn requested changes Oct 22, 2024

View reviewed changes

Add zip_clones, zips an iterator with clones of a value

5a8a472

Similar to `iter.zip(repeat_n(val, n))' but does not require knowing n

barakugav force-pushed the zip-clones branch from f04a047 to 5a8a472 Compare November 1, 2024 12:18

barakugav requested a review from jswrenn November 1, 2024 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `zip_clones`, zips an iterator with clones of a value #989

Add `zip_clones`, zips an iterator with clones of a value #989

barakugav commented Sep 1, 2024

phimuemue commented Sep 1, 2024

barakugav commented Sep 2, 2024 •

edited

Loading

phimuemue commented Sep 2, 2024

barakugav commented Sep 3, 2024 •

edited

Loading

barakugav commented Sep 22, 2024

phimuemue commented Sep 22, 2024

jswrenn commented Sep 22, 2024

jswrenn Oct 22, 2024

barakugav Nov 1, 2024 •

edited

Loading

barakugav commented Nov 1, 2024

codecov bot commented Nov 14, 2024

barakugav commented Jan 1, 2025

Add zip_clones, zips an iterator with clones of a value #989

Are you sure you want to change the base?

Add zip_clones, zips an iterator with clones of a value #989

Conversation

barakugav commented Sep 1, 2024

phimuemue commented Sep 1, 2024

barakugav commented Sep 2, 2024 • edited Loading

phimuemue commented Sep 2, 2024

barakugav commented Sep 3, 2024 • edited Loading

barakugav commented Sep 22, 2024

phimuemue commented Sep 22, 2024

jswrenn commented Sep 22, 2024

jswrenn Oct 22, 2024

Choose a reason for hiding this comment

barakugav Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

barakugav commented Nov 1, 2024

codecov bot commented Nov 14, 2024

Codecov Report

barakugav commented Jan 1, 2025

Add `zip_clones`, zips an iterator with clones of a value #989

Add `zip_clones`, zips an iterator with clones of a value #989

barakugav commented Sep 2, 2024 •

edited

Loading

barakugav commented Sep 3, 2024 •

edited

Loading

barakugav Nov 1, 2024 •

edited

Loading