Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Using stream maps to remove a property doesn't drop it from the JSON schema "required": [...] array #2662

Closed
1 task
edgarrmondragon opened this issue Sep 11, 2024 · 0 comments · Fixed by #2663
Assignees
Labels
Milestone

Comments

@edgarrmondragon
Copy link
Collaborator

Singer SDK Version

0.40.0

Is this a regression?

  • Yes

Python Version

3.12

Bug scope

Mapping (stream maps, flattening, etc.)

Operating System

NA

Description

Hello, I have question related to the "stream_maps" functionality.

Given:
Stream in custom tap with the following schema (pay attention "title" is required):

schema = th.PropertiesList(
        th.Property(
            "id",
            th.IntegerType,
            required=True,
            nullable=False,
            description="The post's ID",
        ),
        th.Property(
            "title",
            th.StringType,
            required=True,
            nullable=True,
            description="The post's title",
        ),
        th.Property(
            "created_at",
            th.DateTimeType,
            required=True,
            nullable=False,
            description="The post's creation date",
        ),
    ).to_dict()

The same stream is transformed in tap's configuration ("title" required property is removed):

 config:
    stream_maps:     
        posts:
          __alias__: posts_v2
          __filter__: record['id'] != 3
          id_hashed: md5(str(record['id']))
          author: f'{fake.first_name()} {fake.last_name()}'
          title: __NULL__
          title_new: "' '.join([c.upper() for c in record['title'].replace(' ', '')])"
          year: int(datetime.datetime.strptime(record['created_at'], '%Y-%m-%dT%H:%M:%SZ').year)
          month: int(datetime.datetime.strptime(record['created_at'], '%Y-%m-%dT%H:%M:%SZ').month)
          day: int(datetime.datetime.strptime(record['created_at'], '%Y-%m-%dT%H:%M:%SZ').day)

As result I see the following SCHEMA object generated:

{"type":"SCHEMA","stream":"posts_v2","schema":{"properties":{"id":{"description":"The post's ID","type":["integer"]},"created_at":{"description":"The post's creation date","format":"date-time","type":["string"]},"id_hashed":{"type":["string","null"]},"author":{"type":["string","null"]},"title_new":{"type":["string","null"]},"year":{"type":["integer","null"]},"month":{"type":["integer","null"]},"day":{"type":["integer","null"]}},"type":"object","required":["id","title","created_at"]},"key_properties":["id"],"bookmark_properties":["id"]}

The problem is that new schema contains "title" in required properties list and it is causing issues in target, so I have to disable validation in target.

Question: Is there any way to update required properties list in Schema object after applying transformation in stream_maps?

Code

No response

Link to Slack/Linen

https://meltano.slack.com/archives/C069CQNHDNF/p1726062079634019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant