-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support integer types other than BIGINT
#485
base: main
Are you sure you want to change the base?
feat: Support integer types other than BIGINT
#485
Conversation
target_postgres/connector.py
Outdated
def _handle_integer_type(self, jsonschema: dict) -> SMALLINT | INTEGER | BIGINT: | ||
"""Handle integer type.""" | ||
if maximum := jsonschema.get("maximum"): | ||
if maximum < 2**15: | ||
return SMALLINT() | ||
if maximum < 2**31: | ||
return INTEGER() | ||
|
||
return BIGINT() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't like inferring the desired integer type based on a "maximum" value.
It's IMO unnecessarily indirect and also a bit imprecise, as the numeric Postgres integer types are signed, and the maximum would also indicate a "minimum", which however differs by one . For example, smallint is in the range[-2**15, 2**15)
.
why not letting the schema specify it explicitly?
Something like:
{"type": integer, "dbtype": "smallint"}
(name of "dbtype" field very much up for debate of course)
If dbtype
is omitted, then the default could be BIGINT
if dbtype
is invalid/unknown/unsupported, we raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed 5966233 to handle the ranges correctly.
Now, I do want to support custom keywords like dbtype
(see meltano/sdk#2102, meltano/sdk#2783).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, meltano/sdk#2829 will make this nicer
@cached_property | ||
def jsonschema_to_sql(self) -> JSONSchemaToSQL: | ||
"""Return a JSONSchemaToSQL instance with custom type handling.""" | ||
to_sql = JSONSchemaToPostgres(content_encoding=self.interpret_content_encoding) | ||
to_sql.fallback_type = TEXT | ||
to_sql.register_type_handler("integer", BIGINT) | ||
to_sql.register_type_handler("integer", self._handle_integer_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty sure the answer is yes here, but would this also be able to parse the schema that is generated by postgres-tap
:
{"type":["integer","null"]}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's right. It's handled upstream if you're curious: https://github.com/meltano/sdk/blob/e7783fd4ea6823daf9068124985118268da3a354/singer_sdk/connectors/sql.py#L354-L356.
I've added a test just to be safe: 79212a8.
d914824
to
79212a8
Compare
TODO:
Data Types
section of the readme, and how to override the schema to force the target into a certain integer type