Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring BigQuery FastSync implementation into alignment with Snowflake #901

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

judahrand
Copy link
Contributor

@judahrand judahrand commented Feb 17, 2022

Problem

The BigQuery FastSync mechanism currently works differently to the Snowflake implementation which uploads the CSVs to S3 before importing into Snowflake. BigQuery is able to do the same thing but with GCS. This should result in faster operation assuming the PipelineWise is running in GCP.

Proposed changes

  • Bring the BigQuery FastSync mechanism into alignment with Snowflake.
  • Update BigQuery tests.
  • This change does mean than BigQuery installation will now need a GCS bucket configured.

Types of changes

What types of changes does your code introduce to PipelineWise?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

  • I have read the CONTRIBUTING doc
  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • CI checks pass with my changes
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
  • Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

@judahrand judahrand changed the title Upstream/bigquery fastsync gcs Bring BigQuery FastSync implementation into alignment with Snowflake Feb 17, 2022
@judahrand
Copy link
Contributor Author

@Samira-El is this something that you'd be interested in looking at and merging given that Wise doesn't maintain or test this area of the codebase?

@Samira-El
Copy link
Contributor

Hey Judah, i would ping @jmriego to get his opinion on this and whether this would create any conflict with pipelinewise-target-bigquery. Also I reckon usage of GCS should also be implemented in that target.

@judahrand
Copy link
Contributor Author

Hey Judah, i would ping @jmriego to get his opinion on this and whether this would create any conflict with pipelinewise-target-bigquery.

It doesn't conflict we're running it this way currently.

Also I reckon usage of GCS should also be implemented in that target.

Yup, this might be a good addition. Though there are some peculiarities between the FastSync and Singer targets. The FastSync implementation uses CSVs whereas the Singer version uses Avro.

What are your thoughts @jmriego?

@jmriego
Copy link
Contributor

jmriego commented Feb 23, 2022

sorry, just seeing this now. I think the GCS implementation makes sense but I'm a bit worried about the effects of making that mandatory. I know in my company we would have issues doing that and it's also a different service you have to enable. It's not as integrated as it is on Snowflake.
I'd prefer if GCS support was optional so that anyone updating to the latest PipelineWise version doesn't find themselves with the FastSync suddenly not working until they enable, bill and add the GCS configuration.

Sorry about this, I think GCS support totally makes sense. It would enable that if it's not possible to load the file for some reason, you could at least download it locally and check any issues with the data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants