Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Filter collections #36

Merged
merged 13 commits into from
Dec 13, 2024

Conversation

ReubenFrankel
Copy link
Contributor

@ReubenFrankel ReubenFrankel commented Dec 6, 2024

Implements collection filtering for improved discover performance. New filter_collections setting accepts a string or array of strings to filter collection names by.

collections = self.database.list_collection_names(
authorizedCollections=True,
nameOnly=True,
filter={"$or": [{"name": c} for c in self._collections]} if self._collections else None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filter={"$or": [{"name": c} for c in self._collections]} if self._collections else None,
filter={"name": {"$in": self._collections}} if self._collections else None,

This would be ideal, but I get the following error:

pymongo.errors.OperationFailure: can't get regex from filter doc not a regex, full error: {'ok': 0, 'errmsg': "can't get regex from filter doc not a regex", 'code': 8000, 'codeName': 'AtlasError'}

Can't see any indication from docs that this isn't supported:

@ReubenFrankel ReubenFrankel marked this pull request as ready for review December 7, 2024 00:59
@ReubenFrankel ReubenFrankel force-pushed the filter-collections branch 2 times, most recently from ac5d4de to d847006 Compare December 11, 2024 01:30
@edgarrmondragon edgarrmondragon self-assigned this Dec 11, 2024
@edgarrmondragon
Copy link
Member

It seems there's significant bitrot in the CI 😞. If we can fix it I'm happy to merge this PR. We may also need/want to drop support for a few EOL Python versions.

@ReubenFrankel
Copy link
Contributor Author

@edgarrmondragon I've got a CI fix branch in the works (trying not to touch too much here)

Comment on lines 97 to 100
th.OneOf(
th.StringType,
th.ArrayType(th.StringType),
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Meltano doesn't yet support setting union types and that is arguably how most of the time this tap will be used, wdyt of only accepting an array of strings for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel that strongly about the union type - this just started out as a POC for @melgazar9 who wanted to filter one collection at a time, so I thought it made sense to support that pattern as a string also. I think the Meltano setting definition needs to be kind: array as you pointed out, but that shouldn't have any bearing here. If you think it's confusing, I'm all for removing it in favour of a better UX/DX. 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove it. It adds little benefit and complicates things for automatic settings generation from the tap's --about output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…only type that can support both single and multiple values
auto-merge was automatically disabled December 12, 2024 01:59

Head branch was pushed to by a user without write access

@edgarrmondragon
Copy link
Member

Thanks @ReubenFrankel!

@edgarrmondragon edgarrmondragon merged commit de2cddd into MeltanoLabs:main Dec 13, 2024
25 checks passed
@ReubenFrankel ReubenFrankel deleted the filter-collections branch December 13, 2024 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants