Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support integer types other than BIGINT #485

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

edgarrmondragon
Copy link
Member

@edgarrmondragon edgarrmondragon commented Nov 27, 2024

TODO:

  • Docs, specifically updating the Data Types section of the readme, and how to override the schema to force the target into a certain integer type
  • Tests

@edgarrmondragon edgarrmondragon linked an issue Nov 27, 2024 that may be closed by this pull request
@edgarrmondragon edgarrmondragon self-assigned this Nov 27, 2024
@edgarrmondragon edgarrmondragon added this to the v0.2.0 milestone Nov 27, 2024
@edgarrmondragon edgarrmondragon added the enhancement New feature or request label Nov 27, 2024
@edgarrmondragon edgarrmondragon marked this pull request as ready for review November 27, 2024 20:49
Comment on lines 265 to 274
def _handle_integer_type(self, jsonschema: dict) -> SMALLINT | INTEGER | BIGINT:
"""Handle integer type."""
if maximum := jsonschema.get("maximum"):
if maximum < 2**15:
return SMALLINT()
if maximum < 2**31:
return INTEGER()

return BIGINT()

Copy link

@maxmarcon maxmarcon Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like inferring the desired integer type based on a "maximum" value.
It's IMO unnecessarily indirect and also a bit imprecise, as the numeric Postgres integer types are signed, and the maximum would also indicate a "minimum", which however differs by one . For example, smallint is in the range[-2**15, 2**15).

why not letting the schema specify it explicitly?

Something like:

{"type": integer, "dbtype": "smallint"}

(name of "dbtype" field very much up for debate of course)
If dbtype is omitted, then the default could be BIGINT
if dbtype is invalid/unknown/unsupported, we raise

Copy link
Member Author

@edgarrmondragon edgarrmondragon Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed 5966233 to handle the ranges correctly.

Now, I do want to support custom keywords like dbtype (see meltano/sdk#2102, meltano/sdk#2783).

@cached_property
def jsonschema_to_sql(self) -> JSONSchemaToSQL:
"""Return a JSONSchemaToSQL instance with custom type handling."""
to_sql = JSONSchemaToPostgres(content_encoding=self.interpret_content_encoding)
to_sql.fallback_type = TEXT
to_sql.register_type_handler("integer", BIGINT)
to_sql.register_type_handler("integer", self._handle_integer_type)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure the answer is yes here, but would this also be able to parse the schema that is generated by postgres-tap:

{"type":["integer","null"]}

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's right. It's handled upstream if you're curious: https://github.com/meltano/sdk/blob/e7783fd4ea6823daf9068124985118268da3a354/singer_sdk/connectors/sql.py#L354-L356.

I've added a test just to be safe: 79212a8.

@edgarrmondragon edgarrmondragon force-pushed the 484-feat-support-integer-types-other-than-bigint branch from d914824 to 79212a8 Compare November 28, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

feat: Support integer types other than BIGINT
2 participants