Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement State Capability #13

Open
pnadolny13 opened this issue Feb 3, 2022 · 3 comments
Open

Implement State Capability #13

pnadolny13 opened this issue Feb 3, 2022 · 3 comments

Comments

@pnadolny13
Copy link
Collaborator

Related to discussions in #11, we should implement state here.

@pnadolny13
Copy link
Collaborator Author

An example implementation is in tap-spreadsheets-anywhere

@aaronsteers
Copy link
Contributor

@pnadolny13 - The above mentioned example is a file-based bookmark, where the update timestamp of the file is the bookmarked attribute.

The CSV use case, however, seems better suited for a hash-key-based bookmark, where each record is hashed and the bookmark is essentially an array of "seen" records.

Wdyt of file-based vs record-based implementation here?

In the file-based model, all records in the file will come through if the file is modified. In the record-based model, any records already seen would be suppressed from an incremental stream.

The reason I ask is because we'd benefit from a reference implementation of the spec here:

@TyShkan
Copy link
Contributor

TyShkan commented Mar 22, 2023

@aaronsteers @pnadolny13 I'd appreciate both approaches being working together or independently:

  1. Skip files that were already processed and weren't modified since the last sync (e.g., bookmark for a file name and it's modification time)
  2. Skip lines that were already processed (e.g., using the last synced line number for a file)
  3. Update objects that were changed using known keys (e.g., when receive a new state for the same object with a known id or columns that represent an unique key)

Use cases:

  • new files could be added to a sync folder/bucket for the same stream
  • new lines could be appended to exist files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants