-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backfill retries #23679
Backfill retries #23679
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @jamiedemaria and the rest of your teammates on Graphite |
Deploy preview for dagit-core-storybook ready! ✅ Preview Built with commit 58456f5. |
python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py
Outdated
Show resolved
Hide resolved
b0aa0c5
to
b250d4f
Compare
python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py
Show resolved
Hide resolved
python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py
Outdated
Show resolved
Hide resolved
moving this back to draft since i'm going to shift focus to status and filtering (see discussion in planning doc about retries not being high priority for mvp) |
b250d4f
to
c14ad4f
Compare
python_modules/dagster-graphql/dagster_graphql_tests/graphql/test_partition_backfill.py
Outdated
Show resolved
Hide resolved
03be000
to
ddb3f7f
Compare
python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py
Outdated
Show resolved
Hide resolved
python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py
Outdated
Show resolved
Hide resolved
86437e4
to
96a36a0
Compare
28dea82
to
87f8c68
Compare
def mutate( | ||
self, | ||
graphene_info: ResolveInfo, | ||
reexecutionParams: GrapheneReexecutionParams, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it could make sense to make a backfill-specific version of GrapheneReexecutionParams that would take parentBackfillId
and strategy
where strategy
can be FROM_FAILURE
or ALL
. i will make that update
backfill = graphene_info.context.instance.get_backfill(backfill_id) | ||
from_failure = ReexecutionStrategy(strategy) == ReexecutionStrategy.FROM_FAILURE | ||
if not backfill: | ||
check.failed(f"No backfill found for id: {backfill_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might need to add a GrapheneBackfillNotFound
output type. will look into
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If backfills can only be retried from the UI hitting this seems unlikely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, the other backfill actions have this too, and i figured it doesn't hurt to have here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be to make it more in line w the run re-execution types which have GrapheneRunNotFound as a potential return type. i dont think it's really that necessary here though
16a4546
to
db2f974
Compare
@prha @sryza @clairelin135 pinging for review for this one! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks good to me!
Seems like this implementation will retry canceled partitions and failed partitions. Not sure if this is expected behavior, maybe we should communicate this in the UI somehow?
backfill = graphene_info.context.instance.get_backfill(backfill_id) | ||
from_failure = ReexecutionStrategy(strategy) == ReexecutionStrategy.FROM_FAILURE | ||
if not backfill: | ||
check.failed(f"No backfill found for id: {backfill_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If backfills can only be retried from the UI hitting this seems unlikely?
python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py
Show resolved
Hide resolved
python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py
Show resolved
Hide resolved
Yeah the idea is that you can take a canceled or failed backfill and retry anything that didn't work the first time. one example of an issue that has come up is k8s pods being evicted and causing a run in a backfill to fail. we've had users request the ability to retry the just partitions that failed/didn't run so that they dont have to manually make the backfill that targets just those partitions themselves |
3cd9f4a
to
58456f5
Compare
Summary & Motivation
Enables reexecuting a backfill with either all partitions retried or only the failed partitions retried
Re-uses some of the graphene types for run re-execution. I could create different types that are backfill specific instead.
If reexecuting from failure:
For asset backfills, it will create a new backfill that will backfill the set of assets that were not successfully materialized in the first backfill. For job backfills, uses the existing
fromFailure
attribute that will retry a job backfill.Constraints:
When a retried backfill is created we add the parent backfill id and the root backfill id as tags like we do for run retries.
How I Tested These Changes
new tests