-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: default max_records
config for testing
#1333
Comments
@pnadolny13 - My only question here is if it should be 100 max records total or 100 max records per stream. I think there's a good case that per-stream is a better limiter, just to ensure your tests have representative data. Wdyt? |
@aaronsteers yep youre right, per stream is really what I'm looking for. I want a limited amount of data in each stream. |
@pnadolny13 - Thanks for confirming. Here's the internal mechanism I mentioned in office hours, which could be used to deliver this: sdk/singer_sdk/streams/core.py Line 87 in 12679ff
|
@edgarrmondragon I think this may be the right mechanism for users specifying the
WDYT? |
@kgpayne I'm not a fan of changing this attribute at runtime, it feels clunky and is prone to error as you've seen (if you re-instantiate the streams, the value is reset to the class default). I'd prefer at-runtime control in the form of a parameter in tap.sync_all(max_records=tap.config.get("max_records")) wdyt? |
@edgarrmondragon this makes sense to me 👍
By "This config sets the |
I was looking for a feature like this. We try to limit the ingestion in our CI pipeline, we currently do this by altering the start_date (Like Pat mentioned). However, this doesn't work for taps that are not incremental. It would be great to be able to configurate this. Either by using the tap config or using an environment variable: environments:
- name: ci
env:
MAX_RECORDS: 1000 |
@aaronsteers I think this is higher priority than first assumed, as it also blocks adopting the new standard tests framework with SQL taps 😬 |
Theres more discussion around alternative implementations and details of this in #1366. |
Another user requested this: https://meltano.slack.com/archives/CMN8HELB0/p1685438218728059 |
I would also love something like this for our test CI pipelines. Any updates to this issue? |
Feature scope
Taps (catalog, state, stream maps, etc.)
Description
It would be really nice to have something like a
max_records
config that I can set for my taps in test/CI environments that limits the amount of records to sync (e.g. I want around 100 records). We usually recommend limiting the tap's start date to something like yesterday but that could still replicate an excessive amount of data in some cases. If you only need 100 records but syncing from yesterday returns 1M then its a bad pattern. Ideally I could just configure my tap to pull 100 record and once it knows its exceeded that then it just exits. I think the test feature does something similar, exiting after it sees its first successful record.My example from Squared CI:
Set a
DEFAULT_START_DATE
for my extractor start dates in my CICD meltano environment to yesterday's date. For cloudwatch, google analytics, slack, etc. I'm pulling way more data than I need to do simple CI tests. It would allow me to save on time and cost.The text was updated successfully, but these errors were encountered: