Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

post_batch ... differentiate reception batching from transmission. #1213

Open
petersilva opened this issue Sep 6, 2024 · 8 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@petersilva
Copy link
Contributor

batch settings are things that should be tuned, and the best tuning value may be different for consuming versus publishing... likely need independent tuning.

@andreleblanc11
Copy link
Member

I'm assuming we'd also like to have post_batch define how many messages we post at one given time?

I've added changes for the new configuration option + made post_batch to be used in the transfer protocols. I haven't added any changes yet though for message postings.

@andreleblanc11
Copy link
Member

I've ran the static flow tests with post_batch 5 on the sender. Seems to be working as intended. The SFTP connection closes at around every 5 file transfers.

@petersilva
Copy link
Contributor Author

I think post_batch says how many files we send... but I'm confused... if you have a:

  • batch 20
  • post_batch 10

what happens? I'm guessing it should:

  • get 20 messages... (via gather?)
  • run through two batches in the "work and post" steps of the algorithm?
  • then go back to gather?

Is that what you do?

@petersilva
Copy link
Contributor Author

if batch < post_batch... then do we gather more than once? or do we just make post_batch == batch?

@andreleblanc11
Copy link
Member

andreleblanc11 commented Oct 17, 2024

Is that what you do?

No. I only added the option and made the option tunable in the transfer class (ftp / sftp).

If it's not too hard it would be nice to have it configurable both ways (post_batch > batch) && (batch > post_batch).

If post_batch > batch, I think what we'd need to do is

  • Copy the message list to a variable (or file?) and skip the work/post
  • Check if it exists in the upcoming passages of the flow algorithm
  • Check if the total list of messages is greater or equal to batch, post_batch
  • If it's not, run back the gather

@petersilva
Copy link
Contributor Author

hey @reidsunderland ? I'm starting to wonder if this is worth while... I thought it would just be a new setting, and a few lines... but it looks like we are putting a new looping layer everywhere to deal with when batch != post_batch. loop multiple times in gather to accumulate post_batch worth of messages (if post_batch > batch) and then loop multiple times in work+post to accumulate batch worth of messages (if post_batch < batch.) ... I started this... but I don't remember any use cases... is it something worth doing?

@reidsunderland
Copy link
Member

I think the reason we were looking at it is because we observed that the tx_commit can be slow. We currently do 1 tx_commit for every 1 message we publish, and we thought that it would be faster to do 1 tx_commit for multiple messages.

When we tested that, it was a bit faster, but we weren't sure how it would impact error handling or other failures.

I think we decided that it wasn't worth changing tx_commit right now, and it sounds like it's not worth the added complexity to implement a separate post_batch option

@petersilva
Copy link
Contributor Author

Another wrinkle is... if there simply aren't a full batch worth of incoming... (aka gather results in zero new messages) then we want to proceed to posting regardless of batch...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants