storage endpoint for deleting multiple runs in one query #25167

jamiedemaria · 2024-10-09T20:41:04Z

Summary & Motivation

Allows for deleting multiple runs in a single query. This supports deleting backfills in the stacked PR since we'll want to delete all associated runs as well

associated internal pr https://github.com/dagster-io/internal/pull/12089

How I Tested These Changes

new unit tests

jamiedemaria · 2024-10-09T20:41:21Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

storage endpoint for deleting multiple runs in one query #25167 👈
Backfill retries #23679 : 2 other dependent PRs (#25137 , #25165 )
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @jamiedemaria and the rest of your teammates on Graphite

gibsondan

How many runs at once do you imagine being deleted here?

The existing instance method that deletes a single run both deletes the run (which should generally be very fast) and deletes all the events for that run (which can potentially be quite slow / time out): https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/_core/instance/__init__.py#L1861-L1868

I am worried that this problem will be exacerbated if we are deleting multiple runs at once in a single web request

jamiedemaria · 2024-10-11T13:52:05Z

hmm it could be quite a lot of runs. the idea is to enable deleting backfills. I could scratch this approach and do some kind of batching delete in the delete_backfills method, rather than adding a method to do all the run deletion at once.

jamiedemaria · 2024-10-11T13:53:30Z

something like

def delete_backfill(self, backfill_id):
   run_ids = ...
  iterate through run ids in batches of N and to delete runs 
  delete backfill

I'm not sure what factors to consider when determining if that's a better approach though

gibsondan · 2024-10-11T15:04:46Z

Is delete_backfill being called in a graphql endpoint? I think that could still run into the same problem if so - ultimately we are subject to a 60 second timeout for an individual graphql request. The way that we solved this for bulk terminating runs was to have the frontend manage the 'bulk' part and issue individual termination requests, each of which is fast

Another much more involved solution would be to invest in some kind of async bulk action system where the request triggers work, that work happens asynchronously elsewhere (e.g. in a daemon or some other backend system) and the frontend can then poll for status updates until that work is finished.

jamiedemaria · 2024-10-11T15:47:29Z

yeah delete_backfill would be called by gql. I wanted to be able to support backfills with the multi-select action menu in the runs page, which would require ability to delete backfills. But it seems like adding that is much more complicated than i originally thought. I think it makes sense to punt on it and reconsider if we get lots of requests for the feature

gibsondan · 2024-10-11T15:49:46Z

For this specific case i could imagine it moving the backfill into a DELETING state and then the backfill daemon picks it up and performs the actual deletions that need to happen before doing the actual deletion?

jamiedemaria · 2024-10-11T16:07:34Z

that's an interesting idea

jamiedemaria · 2024-10-11T16:30:13Z

not a full impl, but it was easier to make the pr than write it out - is this generally what you were thinking? would deleting the runs one at a time (or in batches?) also run into constraints?

This was referenced Oct 9, 2024

Backfill retries #23679

Merged

Backfill re-execution #25137

Closed

storage endpoint for deleting backfills #25165

Closed

jamiedemaria mentioned this pull request Oct 9, 2024

Graphene endpoint for deleting a backfill #25166

Closed

jamiedemaria changed the title ~~wip~~ storage endpoint for deleting multiple runs in one query Oct 9, 2024

jamiedemaria force-pushed the jamie/delete-mmultiple-runs branch from b5ac4d7 to 2b17776 Compare October 10, 2024 15:05

jamiedemaria force-pushed the jamie/backfill-retries branch from 87f8c68 to 3e9cd0b Compare October 10, 2024 16:44

jamiedemaria force-pushed the jamie/delete-mmultiple-runs branch from 2b17776 to 00b56ce Compare October 10, 2024 16:45

jamiedemaria added 2 commits October 11, 2024 09:26

wip

f5e593d

endpoint

f893974

jamiedemaria force-pushed the jamie/backfill-retries branch from 3e9cd0b to 16a4546 Compare October 11, 2024 13:39

jamiedemaria force-pushed the jamie/delete-mmultiple-runs branch from 00b56ce to f893974 Compare October 11, 2024 13:39

jamiedemaria marked this pull request as ready for review October 11, 2024 13:40

jamiedemaria requested review from OwenKephart and clairelin135 October 11, 2024 13:41

gibsondan reviewed Oct 11, 2024

View reviewed changes

jamiedemaria closed this Oct 14, 2024

jamiedemaria mentioned this pull request Oct 14, 2024

[wip] delete backfills via daemon #25230

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage endpoint for deleting multiple runs in one query #25167

storage endpoint for deleting multiple runs in one query #25167

jamiedemaria commented Oct 9, 2024 •

edited

Loading

jamiedemaria commented Oct 9, 2024 •

edited

Loading

gibsondan left a comment

jamiedemaria commented Oct 11, 2024

jamiedemaria commented Oct 11, 2024

gibsondan commented Oct 11, 2024

jamiedemaria commented Oct 11, 2024

gibsondan commented Oct 11, 2024

jamiedemaria commented Oct 11, 2024

jamiedemaria commented Oct 11, 2024

storage endpoint for deleting multiple runs in one query #25167

storage endpoint for deleting multiple runs in one query #25167

Conversation

jamiedemaria commented Oct 9, 2024 • edited Loading

Summary & Motivation

How I Tested These Changes

jamiedemaria commented Oct 9, 2024 • edited Loading

gibsondan left a comment

Choose a reason for hiding this comment

jamiedemaria commented Oct 11, 2024

jamiedemaria commented Oct 11, 2024

gibsondan commented Oct 11, 2024

jamiedemaria commented Oct 11, 2024

gibsondan commented Oct 11, 2024

jamiedemaria commented Oct 11, 2024

jamiedemaria commented Oct 11, 2024

jamiedemaria commented Oct 9, 2024 •

edited

Loading

jamiedemaria commented Oct 9, 2024 •

edited

Loading