Improved visibility/workflow for run failures and re-executions #28049

chazmo03 · 2025-02-25T18:10:40Z

What's the use case?

As an operator of a Dagster deployment, I cannot find a decent way to ensure that all of my failed runs have been addressed. This is particularly challenging in the case where there are many failures, e.g., if a database was down, or a bug was deployed.

One can clearly see that there are a bunch of failures in the Runs > Failures tab. But, after having fixed the underlying issue that caused the failures, there is no way to reliably re-execute all of the failures (you can only re-execute the 30 per page as I see it). And after restarting the runs, there is no way to show that the original failed run was re-executed, since at that point the concern is now on the re-executed run.

Ideas of implementation

One way to improve this could be to have an additional status for re-executed runs, so when you re-execute a failed run, then status changes from Failure to something like Re-executed, and then the Re-executed run drops out of the Runs > Failures tab.

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

The text was updated successfully, but these errors were encountered:

chazmo03 added the type: feature-request label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved visibility/workflow for run failures and re-executions #28049

Improved visibility/workflow for run failures and re-executions #28049

chazmo03 commented Feb 25, 2025

Improved visibility/workflow for run failures and re-executions #28049

Improved visibility/workflow for run failures and re-executions #28049

Comments

chazmo03 commented Feb 25, 2025

What's the use case?

Ideas of implementation

Additional information

Message from the maintainers