This repository has been archived by the owner on Sep 23, 2024. It is now read-only.
Use VARIANT
type for properties without a type
#420
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
As described in this issue, the current implementation of the target does not handle columns without a type (property definition as
"property_name": {}
) given that:target
assumes the existence of atype
in the property definition (causing expressions likeschema["type"]
to fail)As also detailed in the issue, this type of "loose schemas" is common in Salesforce
<some_salesforce_entity>History
entities, which essentially can track the history of any field in the main entity (<some_salesforce_entity>
) and does so via two fieldsOldValue
andNewValue
. Depending on which fields the history is being tracked, the actual values can vary in type.Proposed changes
As suggested in this comment, I decided to use Snowflake's
VARIANT
field for these cases.We could go about it in multiple ways. I noticed that Stitch actually creates multiple fields (e.g.
oldvalue_bl
ifboolean
type is detected,oldvalue_st
ifstring
is detected, etc), although not really sure how we could decide on which fields to create without "parsing" the records or receiving aanyOf
definition in the schema, so I decided to go with more of a pragmatic approach asVARIANT
should be able to handle pretty much any data type.Types of changes
What types of changes does your code introduce to PipelineWise?
Put an
x
in the boxes that applyChecklist
setup.py
is an individual PR and not mixed with feature or bugfix PRs[AP-NNNN]
(if applicable. AP-NNNN = JIRA ID)AP-NNN
(if applicable. AP-NNN = JIRA ID)