-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow streams incorrectly claim to support incremental loading #216
Comments
Hey @JohannesRudolph! How are you? Were you able to sort this out? My team has just faced this issue this week and we were wondering whether we would need to try fixing it ourselves or try another solution. I would really appreciate any update on this. Thank you! |
Not really. My workaround was to put it into a target that will always replace all data, since the stream is not incremental. Viele Grüße,JohannesAm 09.10.2024 um 10:54 schrieb Nélson Rangel ***@***.***>:
Hey @JohannesRudolph! How are you?
Were you able to sort this out? My team has just faced this issue this week and we were wondering whether we would need to try fixing it ourselves or try another solution.
I would really appreciate any update on this.
Thank you!
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
edgarrmondragon
added a commit
that referenced
this issue
Nov 6, 2024
…_runs` stream Related: - Closes #216
PR to address this: #325 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
So most
GitHubRestStream
descendants in the tap support incremental loading using a combination ofupdated_at
replication key and GH APIssince
parameter, e.g. repository issueshttps://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues
None of GitHub's APIs used for
workflow
,workflow_runs
andworkflow_run_jobs
streams however supports those parameters, see e.g. https://docs.github.com/en/rest/actions/workflows?apiVersion=2022-11-28#list-repository-workflowsNonetheless, the tap sets replication keys accordingly and creates huge state files (esp. for
workflow_run_jobs
) where everyrun_id
seems to get its own partition.In my pipelines this results in append only behavior where instead I should probably do full loads instead.
A possible solution here might be to use the
use_fake_since_parameter
but I haven't checked this yet and would appreciate if one of the experts of this tap could offer an insightThe text was updated successfully, but these errors were encountered: