-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring BigQuery FastSync implementation into alignment with Snowflake #901
base: master
Are you sure you want to change the base?
Bring BigQuery FastSync implementation into alignment with Snowflake #901
Conversation
@Samira-El is this something that you'd be interested in looking at and merging given that Wise doesn't maintain or test this area of the codebase? |
Hey Judah, i would ping @jmriego to get his opinion on this and whether this would create any conflict with pipelinewise-target-bigquery. Also I reckon usage of GCS should also be implemented in that target. |
It doesn't conflict we're running it this way currently.
Yup, this might be a good addition. Though there are some peculiarities between the FastSync and Singer targets. The FastSync implementation uses CSVs whereas the Singer version uses Avro. What are your thoughts @jmriego? |
sorry, just seeing this now. I think the GCS implementation makes sense but I'm a bit worried about the effects of making that mandatory. I know in my company we would have issues doing that and it's also a different service you have to enable. It's not as integrated as it is on Snowflake. Sorry about this, I think GCS support totally makes sense. It would enable that if it's not possible to load the file for some reason, you could at least download it locally and check any issues with the data |
Problem
The BigQuery FastSync mechanism currently works differently to the Snowflake implementation which uploads the CSVs to S3 before importing into Snowflake. BigQuery is able to do the same thing but with GCS. This should result in faster operation assuming the PipelineWise is running in GCP.
Proposed changes
Types of changes
What types of changes does your code introduce to PipelineWise?
Put an
x
in the boxes that applyChecklist
setup.py
is an individual PR and not mixed with feature or bugfix PRs[AP-NNNN]
(if applicable. AP-NNNN = JIRA ID)AP-NNN
(if applicable. AP-NNN = JIRA ID)