Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for chunked pulls #1866

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Proposal for chunked pulls #1866

wants to merge 2 commits into from

Conversation

tlonny
Copy link

@tlonny tlonny commented Dec 3, 2024

Hi @radex,

I was wondering if doing something like as proposed in my draft PR would work for facilitating "chunked" pulls. I want to avoid a scenario where we have to hold the entire changeset in memory (as in our specific use-case, the amount of changes could be very large) (see: #650)

All I did was add an optional (useGenerator : boolean) parameter. When true, the pullChanges method instead returns an AsyncGenerator, yielding "chunks" of pulled changes vs. the entire payload at once. I've then just naively placed all code that persists/commits these changes into a loop - that runs for each chunk...

My peanut brain can't come up with any reasons why this shouldn't work - but was hoping you could provide some wisdom as to why/how this approach is flawed.

Many Thanks!

Tim

@radex
Copy link
Collaborator

radex commented Dec 5, 2024

@tlonny Hi, some thoughts:

  • general approach to use generators and commit in chunks makes sense
  • it's not something I would recommend for the vast majority of users - we run into data consistency issues, as at any given point we may have a partially committed sync. It's eventually consistent in that even if it fails mid-way, running it second time should fill in any missing data. This is OK for initial sync. And it may be OK if your data model/application is sufficiently resilient to partially missing data
  • if this is for initial sync, why not use turbo sync? Are you running this on web?

@tlonny
Copy link
Author

tlonny commented Dec 5, 2024

Hi @radex

Thanks for your reply!

We were foolish enough to try to build a multi-inbox, cross platform, offline-first IMAP-based e-mail client. Our intention is to hold e-mail metadata relevant for searching + summary presentation (not message text/attachments) client-side.

Even when stored in a relatively concise/compacted format (SQLite) the storage footprint for a single, large inbox can run to >250MB. Presumably, held as an array of JS objects as per the return value of pullChanges, the memory cost would be significantly higher.

Unfortunately, I don't think turbo would help, as a user is able to add a new email inbox at any point in time and thus we are at risk of memory blow-out at any time - not just during the initial sync.

Appreciate your comment about inconsistency - For our use-case, I don't think we would be significantly affected by this - but can certainly understand how other apps could be...

If I tidied up this PR, gating this behavior behind a default-false flag (as per my draft), would it be something you'd consider merging?

Many Thanks,

Tim

@radex
Copy link
Collaborator

radex commented Dec 5, 2024

If I tidied up this PR, gating this behavior behind a default-false flag (as per my draft), would it be something you'd consider merging?

Yes, if it's not the default, then I'm happy with having that in the codebase. I suggest naming potentially unsafe features "unsafeXxxx" just to discourage users unwilling to read into it and make sure if their use case is safe.

For merging, a quick update to tests, a mention in changelog-unreleased and a mention in the sync docs would be highly appreciated.

PS. Good luck with the app!

if this flag is set to true, pullChanges is expected to be an
AsyncGenerator that yields multiple SyncPullResults pullChanges is
expected to be an AsyncGenerator that yields multiple SyncPullResults.
 - fix a few implementation issues that were causing tests to fail
 - fix type definitions I broke
 - add support for async generators to babel
@tlonny
Copy link
Author

tlonny commented Dec 6, 2024

Hi @radex

Progress update from me:

  1. Added optional (default false) flag: useUnsafeChunkedAsyncGenerator
  2. Added code to synchronize.js to handle pullChanges returning an AsyncGenerator<SyncPullResults> conditional on the flag.
  3. Updated flow types and type definitions - typescript will now complain about inconsistencies between pullChanges and useUnsafeChunkedAsyncGenerator.
  4. Added babel support for transpiling async generators

Things left to do:

  1. Write tests
  2. Add to docs
  3. Add to changelogs
  4. Write runtime checks for javascript users to ensure types of: useUnsafeChunkedAsyncGenerator and pullChanges are correct.

If you have time, please can you provide feedback on the following:

  1. Async generator syntax isn't fully compatible with better-async (a babel plugin). Specifically, the for await syntax isn't supported (see: for await is not supported to be transformed MatAtBread/fast-async#70). I opted to use: while(!result.done) { .. } instead as I figured this was the least intrusive option. Alternatively we could choose to remove fast-async, or use a non-generator solution (maybe pullChanges returns a hasMore boolean).
  2. Flow types made me want to pull my hair out. I wanted to get the types such that the expected type of pullChanges was conditional on the value of the useUnsafeChunkedAsyncGenerator flag. The only way I was able to achieve that was by doing a union type of two variants of SyncArgs. Any attempt I made to "share" type properties across the variants (using an intersection with a common "Base" type for example) caused the Flow typechecker to start complaining. This is why the type definition for SyncArgs is so large/ugly.
  3. In the scenario where there is an empty generator, I opt to throw an error. This fact is derived by newLastPulledAt being null after we've done the pullChanges portion of synchronize.

@radex
Copy link
Collaborator

radex commented Dec 6, 2024

Alternatively we could choose to remove fast-async,

I would consider this. While I love the fast-async project, unfortunately it's not super popular, and the complexity of Watermelon's babel config makes maintenance much more difficult. But if you found another solution, that's fine too.

pullChanges types

I'm fine with pullChanges: this | that even if it's not fully precise (requires another parameter), if that makes things easier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants