Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

Perform logical replication after initial sync #144

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

judahrand
Copy link
Contributor

@judahrand judahrand commented Jan 11, 2022

Problem

If you select new tables to add to an existing log based extract, it'll do a "logical_initial" full table sync on those new tables each and every time the tap runs. It'll never switch over to log based replication for them.

Another example of this issue is that if intend to run with break_at_end_lsn: False then I have to run the tap twice - once to perform the initial sync and again to start the logical replication.

Proposed changes

This Pull Request contains changes which are simpler and, I believe, more correct than those in #130. Both Pull Requests aim to solve #107.

My changes ensure that whenever a LOG_BASED replication runs the bookmark is initialised correctly and any logical replication which is required is performed.

Types of changes

What types of changes does your code introduce to PipelineWise?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • CI checks pass with my changes
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
  • Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

@judahrand
Copy link
Contributor Author

@Samira-El Would you be happy with these changes instead of #130? I believe they should address the same issue.

@judahrand
Copy link
Contributor Author

@Samira-El Could you please review this?

@judahrand
Copy link
Contributor Author

@koszti Would you be willing to review this?

@judahrand
Copy link
Contributor Author

@Samira-El I believe this should also be ready for a review. This would also allow the closure of #131 and #130, I believe.

@deanmorin
Copy link
Contributor

As the author of #130 I agree that this is a better fix. I've closed that PR in favor of this one.

Is there anything preventing this from being merged @Samira-El ?

@deanmorin
Copy link
Contributor

I'm actually getting an error running this branch, I'll need to do some more investigation:

TypeError: '<' not supported between instances of 'NoneType' and 'int'
lsn_comitted = min(
File "/project/.meltano/extractors/tap-provider/venv/lib/python3.9/site-packages/tap_postgres/sync_strategies/logical_replication.py", line 684, in sync_tables

@judahrand
Copy link
Contributor Author

judahrand commented Jan 28, 2022

I'm actually getting an error running this branch, I'll need to do some more investigation:

TypeError: '<' not supported between instances of 'NoneType' and 'int'
lsn_comitted = min(
File "/project/.meltano/extractors/tap-provider/venv/lib/python3.9/site-packages/tap_postgres/sync_strategies/logical_replication.py", line 684, in sync_tables

Hmmm... Interesting. I'm confused as to how this could be caused by the changes here as the line you're seeing a failure on is also run right at the beginning of that function:
https://github.com/thread/pipelinewise-tap-postgres/blob/64e1f919411993e12148e6b9bd181817f3cf37bf/tap_postgres/sync_strategies/logical_replication.py#L557

So theNone couldn't be present at the start?

@Samira-El
Copy link
Contributor

Is there anything preventing this from being merged @Samira-El ?

Time, I don't have time to review yet it 😓. This is crucial tap for us, so PRs need to be properly reviewed.

@judahrand
Copy link
Contributor Author

@Samira-El Any ideas when you might have the time to review this? We're currently running a fork and would of course love to be able to minimize the divergence!

@Samira-El
Copy link
Contributor

Samira-El commented Feb 23, 2022

Hey Judah, I'm running into the same error encountered by Dean:

tap_postgres/sync_strategies/logical_replication.py", line 684, in sync_tables
    lsn_comitted = min(
TypeError: '<' not supported between instances of 'NoneType' and 'int'

I'm following the bug reproduction step in #107

@judahrand
Copy link
Contributor Author

@Samira-El I think my latest commit fixes this issue.

Comment on lines +232 to +238
assert CAUGHT_MESSAGES[8]['type'] == 'SCHEMA'

assert isinstance(CAUGHT_MESSAGES[9], singer.messages.StateMessage)
assert CAUGHT_MESSAGES[9].value['bookmarks']['public-COW'].get('xmin') is None
assert CAUGHT_MESSAGES[9].value['bookmarks']['public-COW'].get('lsn') == end_lsn
assert CAUGHT_MESSAGES[9].value['bookmarks']['public-COW'].get('version') == new_version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use TestCase assert* methods instead?

Copy link

@josescuderoh josescuderoh Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Samira-El I'm facing this issue and planning on contributing the fix. Is your feedback related to line 268 only? That is change it to assertIsInstance(CAUGHT_MESSAGES[9], singer.messages.StateMessage). Asking this since I see all other assert statements don't use TestCase either.

@Samira-El
Copy link
Contributor

Hey folks, just had the time to review this PR, the fix LGTM, I left a comment on the tests tho.

The CI is broken because of a recent merged PR, we will roll it back and rerun the tests on this one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants