You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, I would like to propose adding a method batch_by (I don't know if this is the best name, however I'll use it here).
batch_by would allow you to group elements by their size.
You provide a maximum size, and a batch of elements aims to be less than that size.
If an element is too big, it is placed into a batch of one on it's own ... I'm not 100% sure if this should be the correct behaviour but it is what I have used in the past.
The batches preserve the order of the Iterator; it builds a batch as it goes and does not look ahead to find items to fit past batches.
I would also propose try_batch_by for dealing with Iterators of Result.
(I'm not 100% sure on the need for Num. I used it in an implementation I wrote, however I believe it could probably be dropped.)
Pseudo Example Code
constMAX_BATCH_SEND_SIZE:usize = 100;// Data we are sendinglet data_to_send = vec!["short-data","very-very-very-...-very-long-data","medium-length-data","short-again","medium-length-data-again",// ... imagine lots more data ...];// This is the function in uselet batches = data_to_send
.into_iter().batch_by(MAX_BATCH_SEND_SIZE, |data| data.len());// Send batches of datafor batch in batches {let batch_to_send = batch.collect::<Vec<&'staticstr>>();send(batch_to_send).await?;}
Motivations
I have personally needed this on several projects when grouping things to be sent on to an external service. For example on a real world project, I needed batch data into 10mb groups to send to ElasticSearch.
Comments
AFAIK there is nothing like this in Itertools. There are ways to group by key, or to find the largest or smallest. A means to say 'put these into groups of 10mb or less' .
I have written this before, so if there is interest I would be more than happy to write a PR for Itertools.
The text was updated successfully, but these errors were encountered:
I actually ended up writing this twice due to getting the logic wrong the first time. In practice I would end up putting this into a helper function, so I can wrap that with tests.
Hey, I would like to propose adding a method
batch_by
(I don't know if this is the best name, however I'll use it here).batch_by
would allow you to group elements by their size.try_batch_by
for dealing with Iterators ofResult
.The function signature would be something like:
(I'm not 100% sure on the need for
Num
. I used it in an implementation I wrote, however I believe it could probably be dropped.)Pseudo Example Code
Motivations
I have personally needed this on several projects when grouping things to be sent on to an external service. For example on a real world project, I needed batch data into 10mb groups to send to ElasticSearch.
Comments
AFAIK there is nothing like this in Itertools. There are ways to group by key, or to find the largest or smallest. A means to say 'put these into groups of 10mb or less' .
I have written this before, so if there is interest I would be more than happy to write a PR for Itertools.
The text was updated successfully, but these errors were encountered: