-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load CDK: Insert Loader Interface #56428
base: jschmidt/dest-mssql/mssql-uses-bulk-load
Are you sure you want to change the base?
Load CDK: Insert Loader Interface #56428
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
258d44c
to
ef6d9ca
Compare
417391c
to
8405b2b
Compare
8405b2b
to
9529881
Compare
ef6d9ca
to
c7c25f3
Compare
9529881
to
8220b2a
Compare
c7c25f3
to
0d82af4
Compare
8220b2a
to
2ac2a07
Compare
e650f75
to
4bac608
Compare
import kotlinx.coroutines.runBlocking | ||
import org.jetbrains.annotations.VisibleForTesting | ||
|
||
class ResourceReservingPartitionedQueue<T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking: this code already existed elsewhere, and this PR just moves it all into here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, the logic existed, spread out over a few beans, i packed it into one thing and migrated the tests as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could say you opened the can
import io.airbyte.cdk.load.message.DestinationRecordRaw | ||
import io.airbyte.cdk.load.write.LoadStrategy | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still reviewing the PR, but I'm curious:
- why is MSSQL a bad fit for this? my read from this comment is that it should fit exactly into this paradigm (... not that I've really read how MSSQL standard inserts work)
- I think bigquery actually falls into the
cases where the insert query is built in a streaming fashion via an open shared connection
paradigm? unless you're thinking of the kafka case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because with MSSQL you can't start a new query for table X until you fully commit the previous one, so you might as well do the work in one step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible bigquery works the same way and we don't need this at all
private val reservation = runBlocking { | ||
reservationManager.reserveOrThrow(requestedResourceAmount, this) | ||
} | ||
private val minNumUnits: Int = numProducers + numConsumers * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making sure, you don't mean (numProducers + numConsumers) * 2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. What I'm calculating here really is "number of units each producer will hold" (numProducers) + "number of units each consumer will hold" (numConsumers) + "number of units if there's exactly 1 thing enqueued in each consumer's partition" (numConsumers)
Creates an
InsertLoader
interface, which drives the case where you're collecting records into bulk queries/api calls but without needing a specific connection open (ie, not like a jdbc prepared statement, but more like a bigquery bulk insert.)I tested this with MSSQL StandardInsert and it works, but it's not a good fit. (In a separate PR I'm moving MSSQL to DirectLoader.) But I'm leaving this here for people to have something to build on it for BigQuery when the time comes.