You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have run into a problem when using the GCS->BQ batch mode of the BigQuerySinkConnector; each connector schedules it's own instance of the GCSToBQLoadRunnable which does not use the GCS folder when listing objects to load into BigQuery.
Because of this, if you have multiple connectors using the same bucket but different folders they all load all the objects in the bucket, irrespective of the folder they are in, and so you receive many duplicates in BQ. Further to this, only one instance will successfully delete the object and when the other instances try and fail, they will simply try again and again.
I have run into a problem when using the GCS->BQ batch mode of the
BigQuerySinkConnector
; each connector schedules it's own instance of theGCSToBQLoadRunnable
which does not use the GCS folder when listing objects to load into BigQuery.Because of this, if you have multiple connectors using the same bucket but different folders they all load all the objects in the bucket, irrespective of the folder they are in, and so you receive many duplicates in BQ. Further to this, only one instance will successfully delete the object and when the other instances try and fail, they will simply try again and again.
GCSToBQLoadRunnable
can be seen hereThe text was updated successfully, but these errors were encountered: