-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snowflake destination does not respect compound primary_key
in MERGE
statements (merge
upsert
strategy); also implements merge_key
in delete-insert
mode in an unexpected way
#2320
Comments
@acaruso7: the primary key defined in the incremental is not used to control the materialization in the destination, but to deduplicate incoming data in a load. If you need a compound primary key for your sql_table, you can define one with sql_table_instance.apply_hints(primary_key=("blah", "bluh")). The reason you are seing the ID being used for the merge is, because as it seems you have a primary_key of "id" defined in the sql schema of the source database. |
@sh-rp thanks for the response I see now, the This is a little confusing naming convention wise; personally I think it would be more clear to use something like If I can control the join condition for SQL MERGE expressions in destination using
One other followup question for you: is there a way I can easily log/print out the raw merge expression SQL generated by dlt and submitted to destination? That would make configuring things a lot more straightforward and easier for me to understand the behavior of various config changes as it appears in my destination |
Note for future readers . . . I think the reason I was encountering this
is a combination of the following:
|
@acaruso7 about your followup question: you can print those statements out with log_level debug I believe. Alternatively, check out the loaded jobs in your pipeline folder at ~/.dlt/pipelines/<pipeline_name>/... All the merge jobs will be in there as sql jobs, that might be a bit less confusing than parsing all the debug output. |
dlt version
1.5.0
Describe the problem
https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1739564309666949
Snowflake destination does not respect compound
primary_key
inMERGE
statements generated byincremental
merge
upsert
strategy. Instead, it just defaults to using only the primary key fromsql_table()
source inMERGE
expressionExample:
is generated in Snowflake destination when pipeline is configured as:
Snowflake destination also produces unexpected behavior when using
incremental
merge
delete-insert
strategy with compoundmerge_key
.DELETE
statements in destinationOR
together the join condition for theprimary_key
(which, as mentioned above, it seems I'm not able to set), with an additional join condition produced by the compoundmerge_key
configExample:
is generated in Snowflake destination when pipeline is configured as:
Expected behavior
sql_table()
source with Snowflake destination configured as below:should produce the following join condition in generated Snowflake
MERGE
statement:sql_table()
source with Snowflake destination configured as below:should produce the following join condition in generated Snowflake
DELETE
statementColumns which are part of both the
primary_key
andmerge_key
should only be considered once in join condition, and neverOR
'd togetherSteps to reproduce
Unsure how to reproduce as I do not have a public Snowflake instance where I can demonstrate the behavior, but the snippets above should get most of the way there
Operating system
Linux
Runtime environment
Docker, Docker Compose
Python version
3.10
dlt data source
sql_table()
dlt destination
Snowflake
Other deployment details
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: