Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/fix replicate table primary keys #88

Merged

Conversation

s7clarke10
Copy link
Collaborator

Resolving issue when a table has a primary key and unique key. Both unique and primary key columns were being identified as the primary key for the target table. Prioritising the primary key first, and unique key secondary if there is no primary key.

The code prior to the fix is overstating what the primary key is.

This change resolves #87

@s7clarke10 s7clarke10 requested a review from mjsqu October 9, 2024 05:53
@s7clarke10 s7clarke10 self-assigned this Oct 9, 2024
@mjsqu
Copy link
Collaborator

mjsqu commented Oct 10, 2024

Recommended adding tests for the new logic

@mjsqu
Copy link
Collaborator

mjsqu commented Oct 16, 2024

Can't add suggestions for files outside of the scope of the PR, however I recommend adding the following test here:
https://github.com/s7clarke10/pipelinewise-tap-mssql/blob/bug/fix_replicate_table_primary_keys/tests/test_tap_mssql.py#L652

class TestPrimaryKeyUniqueKey(unittest.TestCase):
    def setUp(self):
        self.conn = test_utils.get_test_connection()

        with connect_with_backoff(self.conn) as open_conn:
            with open_conn.cursor() as cursor:
                try:
                    cursor.execute("drop table uc_only_table")
                except:
                    pass
                try:
                    cursor.execute("drop table pk_only_table")
                except:
                    pass
                try:
                    cursor.execute("drop table pk_uc_table")
                except:
                    pass
                cursor.execute(
                    """
                    CREATE TABLE uc_only_table (
                      pk int,
                      uc_1 int,
                      uc_2 int,
                    CONSTRAINT constraint_uc_only_table UNIQUE(uc_1,uc_2)  )
                    """
                )
                cursor.execute(
                    """
                    CREATE TABLE pk_only_table (
                      pk int PRIMARY KEY,
                      uc_1 int,
                      uc_2 int,
                    )
                    """
                )
                cursor.execute(
                    """
                    CREATE TABLE pk_uc_table (
                      pk int PRIMARY KEY,
                      uc_1 int,
                      uc_2 int,
                    CONSTRAINT constraint_pk_uc_table UNIQUE(uc_1,uc_2)  )
                    """
                )

    def test_only_primary_key(self):
        catalog = test_utils.discover_catalog(self.conn, {})
        primary_keys = {}
        for c in catalog.streams:
            primary_keys[c.table] = (
                singer.metadata.to_map(c.metadata).get((), {}).get("table-key-properties")
            )

        self.assertEqual(primary_keys["uc_only_table"], ["uc_1","uc_2"])
        self.assertEqual(primary_keys["pk_only_table"], ["pk"])
        self.assertEqual(primary_keys["pk_uc_table"], ["pk"])

The test creates three tables for the scenarios you are addressing:

  • uc_only_table - Unique Constraint Only
  • pk_only_table - Primary Key Only
  • pk_uc_table - Both Primary Key and Unique Constraints

Looking at the last three lines of the test, the assertions are used to test the expected primary key chosen in each scenario:

  • uc_only_table - the uc columns
  • pk_only_table - the pk column
  • pk_uc_table - after your update this is just the primary key column - where prior behaviour was to combine pk and uc columns

To run this on your branch you can do:

docker run -p 1433:1433 -e "MSSQL_SA_PASSWORD=testDatabase1" -e "ACCEPT_EULA=Y" mcr.microsoft.com/mssql/server:2019-latest
poetry install
poetry run pytest -v -k TestPrimaryKey

(the pytest -k switch looks for any tests starting with the provided argument, so TestPrimaryKey matches just TestPrimaryKeyUniqueKey - the name of the added test)

Copy link
Collaborator

@mjsqu mjsqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest pipeline shows 26 tests conducted where previously it was 25 in total (unfortunately can't see the test names). Safe to assume the new test is passing and this change is ready. Thank you @s7clarke10!

@s7clarke10
Copy link
Collaborator Author

Thank you for adding those tests to the testing framework.

The pipeline has successfully run, and the individual test run in codespaces has run successfully as well.

I am happy for this to be approved and merged in.

image

@mjsqu mjsqu merged commit ae0e4f2 into wintersrd:master Oct 17, 2024
1 check passed
@s7clarke10 s7clarke10 deleted the bug/fix_replicate_table_primary_keys branch October 17, 2024 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load into Target Snowflake is failing due to a NULL value in a primary key
2 participants