Backfill retries #23679

jamiedemaria · 2024-08-15T19:11:11Z

Summary & Motivation

Enables reexecuting a backfill with either all partitions retried or only the failed partitions retried

Re-uses some of the graphene types for run re-execution. I could create different types that are backfill specific instead.

If reexecuting from failure:
For asset backfills, it will create a new backfill that will backfill the set of assets that were not successfully materialized in the first backfill. For job backfills, uses the existing fromFailure attribute that will retry a job backfill.

Constraints:

the first backfill must be in a completed state before it can be retried
for asset backfills, if reexecuting from failure some assets must have not been materialized in the first backfill. This differs from another action.

When a retried backfill is created we add the parent backfill id and the root backfill id as tags like we do for run retries.

How I Tested These Changes

new tests

jamiedemaria · 2024-08-15T19:11:34Z

Backfill retries #23679 : 3 dependent PRs (#25137 , #25165 , #25167 ) 👈
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @jamiedemaria and the rest of your teammates on Graphite

github-actions · 2024-08-15T19:15:25Z

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-n05gsv92q-elementl.vercel.app
https://jamie-backfill-retries.core-storybook.dagster-docs.io

Built with commit 58456f5.
This pull request is being automatically deployed with vercel-action

python_modules/dagster/dagster/_core/execution/backfill.py

python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py

jamiedemaria · 2024-08-19T14:32:47Z

moving this back to draft since i'm going to shift focus to status and filtering (see discussion in planning doc about retries not being high priority for mvp)

python_modules/dagster-graphql/dagster_graphql_tests/graphql/test_partition_backfill.py

python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py

jamiedemaria · 2024-10-09T16:59:26Z

python_modules/dagster-graphql/dagster_graphql/schema/roots/mutation.py

+    def mutate(
+        self,
+        graphene_info: ResolveInfo,
+        reexecutionParams: GrapheneReexecutionParams,


i think it could make sense to make a backfill-specific version of GrapheneReexecutionParams that would take parentBackfillId and strategy where strategy can be FROM_FAILURE or ALL. i will make that update

python_modules/dagster-graphql/dagster_graphql/schema/roots/mutation.py

jamiedemaria · 2024-10-09T17:02:23Z

python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py

+    backfill = graphene_info.context.instance.get_backfill(backfill_id)
+    from_failure = ReexecutionStrategy(strategy) == ReexecutionStrategy.FROM_FAILURE
+    if not backfill:
+        check.failed(f"No backfill found for id: {backfill_id}")


might need to add a GrapheneBackfillNotFound output type. will look into

If backfills can only be retried from the UI hitting this seems unlikely?

yeah, the other backfill actions have this too, and i figured it doesn't hurt to have here

it'd be to make it more in line w the run re-execution types which have GrapheneRunNotFound as a potential return type. i dont think it's really that necessary here though

jamiedemaria · 2024-10-14T16:39:29Z

@prha @sryza @clairelin135 pinging for review for this one!

clairelin135

Implementation looks good to me!

Seems like this implementation will retry canceled partitions and failed partitions. Not sure if this is expected behavior, maybe we should communicate this in the UI somehow?

clairelin135 · 2024-10-14T18:31:57Z

python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py

+    backfill = graphene_info.context.instance.get_backfill(backfill_id)
+    from_failure = ReexecutionStrategy(strategy) == ReexecutionStrategy.FROM_FAILURE
+    if not backfill:
+        check.failed(f"No backfill found for id: {backfill_id}")


If backfills can only be retried from the UI hitting this seems unlikely?

python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py

jamiedemaria · 2024-10-14T19:10:51Z

Seems like this implementation will retry canceled partitions and failed partitions.

Yeah the idea is that you can take a canceled or failed backfill and retry anything that didn't work the first time. one example of an issue that has come up is k8s pods being evicted and causing a run in a backfill to fail. we've had users request the ability to retry the just partitions that failed/didn't run so that they dont have to manually make the backfill that targets just those partitions themselves

jamiedemaria commented Aug 15, 2024

View reviewed changes

python_modules/dagster/dagster/_core/execution/backfill.py Outdated Show resolved Hide resolved

jamiedemaria commented Aug 15, 2024

View reviewed changes

python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py Outdated Show resolved Hide resolved

jamiedemaria force-pushed the jamie/backfill-retries branch 2 times, most recently from b0aa0c5 to b250d4f Compare August 16, 2024 17:10

jamiedemaria marked this pull request as ready for review August 16, 2024 19:00

jamiedemaria requested review from sryza and prha August 16, 2024 19:00

jamiedemaria commented Aug 16, 2024

View reviewed changes

python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py Show resolved Hide resolved

jamiedemaria commented Aug 16, 2024

View reviewed changes

python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py Outdated Show resolved Hide resolved

jamiedemaria marked this pull request as draft August 19, 2024 14:32

jamiedemaria force-pushed the jamie/backfill-retries branch from b250d4f to c14ad4f Compare August 21, 2024 17:54

jamiedemaria commented Aug 22, 2024

View reviewed changes

python_modules/dagster-graphql/dagster_graphql_tests/graphql/test_partition_backfill.py Outdated Show resolved Hide resolved

jamiedemaria force-pushed the jamie/backfill-retries branch 2 times, most recently from 03be000 to ddb3f7f Compare October 4, 2024 15:21

jamiedemaria commented Oct 7, 2024

View reviewed changes

python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py Outdated Show resolved Hide resolved

jamiedemaria commented Oct 7, 2024

View reviewed changes

python_modules/dagster-graphql/dagster_graphql/implementation/execution/backfill.py Outdated Show resolved Hide resolved

jamiedemaria force-pushed the jamie/backfill-retries branch 4 times, most recently from 86437e4 to 96a36a0 Compare October 8, 2024 19:46

jamiedemaria mentioned this pull request Oct 8, 2024

Backfill re-execution #25137

Closed

3 tasks

jamiedemaria force-pushed the jamie/backfill-retries branch from 28dea82 to 87f8c68 Compare October 9, 2024 16:48

jamiedemaria marked this pull request as ready for review October 9, 2024 16:51

jamiedemaria requested a review from clairelin135 October 9, 2024 16:51

jamiedemaria commented Oct 9, 2024

View reviewed changes

python_modules/dagster-graphql/dagster_graphql/schema/roots/mutation.py Show resolved Hide resolved

jamiedemaria commented Oct 9, 2024

View reviewed changes

This was referenced Oct 9, 2024

storage endpoint for deleting backfills #25165

Closed

Graphene endpoint for deleting a backfill #25166

Closed

storage endpoint for deleting multiple runs in one query #25167

Closed

jamiedemaria force-pushed the jamie/backfill-retries branch 3 times, most recently from 16a4546 to db2f974 Compare October 14, 2024 13:56

jamiedemaria mentioned this pull request Oct 14, 2024

[wip] delete backfills via daemon #25230

Closed

clairelin135 approved these changes Oct 14, 2024

View reviewed changes

jamiedemaria added 12 commits October 15, 2024 09:19

implementation

5a50671

testing

879e238

testing

f2b36c3

fix missing parameter

564e1f1

update retry subset

c357233

update tags

e763339

fixup

f3c3bf5

cleanup

d563ba6

update

cb2a20c

update to use reexecution params

2323f92

add testing for full retry

f4eeb06

fix up tests

58456f5

jamiedemaria force-pushed the jamie/backfill-retries branch from 3cd9f4a to 58456f5 Compare October 15, 2024 13:45

jamiedemaria merged commit 6fdd972 into master Oct 16, 2024
2 checks passed

jamiedemaria deleted the jamie/backfill-retries branch October 16, 2024 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backfill retries #23679

Backfill retries #23679

jamiedemaria commented Aug 15, 2024 •

edited

Loading

jamiedemaria commented Aug 15, 2024 •

edited

Loading

github-actions bot commented Aug 15, 2024 •

edited

Loading

jamiedemaria commented Aug 19, 2024

jamiedemaria Oct 9, 2024

jamiedemaria Oct 9, 2024

clairelin135 Oct 14, 2024

jamiedemaria Oct 14, 2024

jamiedemaria Oct 14, 2024

jamiedemaria commented Oct 14, 2024

clairelin135 left a comment

clairelin135 Oct 14, 2024

jamiedemaria commented Oct 14, 2024

Backfill retries #23679

Backfill retries #23679

Conversation

jamiedemaria commented Aug 15, 2024 • edited Loading

Summary & Motivation

How I Tested These Changes

jamiedemaria commented Aug 15, 2024 • edited Loading

github-actions bot commented Aug 15, 2024 • edited Loading

jamiedemaria commented Aug 19, 2024

jamiedemaria Oct 9, 2024

Choose a reason for hiding this comment

jamiedemaria Oct 9, 2024

Choose a reason for hiding this comment

clairelin135 Oct 14, 2024

Choose a reason for hiding this comment

jamiedemaria Oct 14, 2024

Choose a reason for hiding this comment

jamiedemaria Oct 14, 2024

Choose a reason for hiding this comment

jamiedemaria commented Oct 14, 2024

clairelin135 left a comment

Choose a reason for hiding this comment

clairelin135 Oct 14, 2024

Choose a reason for hiding this comment

jamiedemaria commented Oct 14, 2024

jamiedemaria commented Aug 15, 2024 •

edited

Loading

jamiedemaria commented Aug 15, 2024 •

edited

Loading

github-actions bot commented Aug 15, 2024 •

edited

Loading