Replies: 7 comments 2 replies
-
Just noticed the second request is already implemented as max_partitions_per_run for the multi_run BackfillPolicy. |
Beta Was this translation helpful? Give feedback.
-
Regarding the first request, it would be important that this is the default behaviour for a job, but when a manual is started only days selected should be materialized. |
Beta Was this translation helpful? Give feedback.
-
When you launch a backfill, you can choose any range of partitions to target. It doesn't need to be all the partitions. Dos that answer your question? |
Beta Was this translation helpful? Give feedback.
-
Hi Sandy, sorry I, wasn’t clear enough. |
Beta Was this translation helpful? Give feedback.
-
Exactly what I was looking for, thanks! |
Beta Was this translation helpful? Give feedback.
-
I have a few questions though:
|
Beta Was this translation helpful? Give feedback.
-
Thank you very much for the clarification! |
Beta Was this translation helpful? Give feedback.
-
Hello,
just looked at Dagster again after more than a year absence and like the direction it is going.
Single-Run Backfills is one of the most important features that is now available.
But I wonder if the support for Backfill policies for Time based partitions could be extended. Especially for the following two use cases which are probably pretty common:
When querying reporting data from Third Party APIs, often the data changes during a certain time range. So there is a need to reprocess data that could have changed again at every scheduled run. Depending on the source this could be just the previous day or as much as a year. Would it be possible to add support to not only process new partitions, but also reprocess any partition up to n partitions in the past? (while still respecting backfill policies such as single time range policy)
Additionally, sometimes APIs and other sources deliver a lot of data or take a lot of time and only a limited time range can be processed at once (due to API timeouts and memory limits). But processing every single day would also be overkill. Would it be possible add something that is a middle ground between processing every single day and processing the whole range? something like "process up to n partitions at a single run"
Beta Was this translation helpful? Give feedback.
All reactions