Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

Use VARIANT type for properties without a type #420

Closed
wants to merge 10 commits into from

Conversation

dlouseiro
Copy link

@dlouseiro dlouseiro commented Jan 31, 2024

Problem

As described in this issue, the current implementation of the target does not handle columns without a type (property definition as "property_name": {}) given that:

  1. The target assumes the existence of a type in the property definition (causing expressions like schema["type"] to fail)
  2. It is unclear on which database Snowflake type should be chosen.

As also detailed in the issue, this type of "loose schemas" is common in Salesforce <some_salesforce_entity>History entities, which essentially can track the history of any field in the main entity (<some_salesforce_entity>) and does so via two fields OldValue and NewValue. Depending on which fields the history is being tracked, the actual values can vary in type.

Proposed changes

As suggested in this comment, I decided to use Snowflake's VARIANT field for these cases.

We could go about it in multiple ways. I noticed that Stitch actually creates multiple fields (e.g. oldvalue_bl if boolean type is detected, oldvalue_st if string is detected, etc), although not really sure how we could decide on which fields to create without "parsing" the records or receiving a anyOf definition in the schema, so I decided to go with more of a pragmatic approach as VARIANT should be able to handle pretty much any data type.

Types of changes

What types of changes does your code introduce to PipelineWise?
Put an x in the boxes that apply

  • New feature (non-breaking change which adds functionality)

Checklist

  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • CI checks pass with my changes
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
  • Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

@dlouseiro
Copy link
Author

I still have to work on the tests to reflect the changes. I've been syncing entities from salesforce using my fork and did not experience any breaking change, but as it's my first contribution to this project still need to look into the tests. Went with more of a trial and error approach using an actual sync for now.

@dlouseiro
Copy link
Author

It is also a bit unclear to me which formatter is being used in the project, so I kind of "manually formatted" the code. If you could provide some clarity on the formatter used I can also adjust code formatting.

@dlouseiro
Copy link
Author

Closed it as the behaviour is not the expected in some cases. Still need to go over it again and do some more tests.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant