Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved visibility/workflow for run failures and re-executions #28049

Open
chazmo03 opened this issue Feb 25, 2025 · 0 comments
Open

Improved visibility/workflow for run failures and re-executions #28049

chazmo03 opened this issue Feb 25, 2025 · 0 comments

Comments

@chazmo03
Copy link
Contributor

What's the use case?

As an operator of a Dagster deployment, I cannot find a decent way to ensure that all of my failed runs have been addressed. This is particularly challenging in the case where there are many failures, e.g., if a database was down, or a bug was deployed.

One can clearly see that there are a bunch of failures in the Runs > Failures tab. But, after having fixed the underlying issue that caused the failures, there is no way to reliably re-execute all of the failures (you can only re-execute the 30 per page as I see it). And after restarting the runs, there is no way to show that the original failed run was re-executed, since at that point the concern is now on the re-executed run.

Ideas of implementation

One way to improve this could be to have an additional status for re-executed runs, so when you re-execute a failed run, then status changes from Failure to something like Re-executed, and then the Re-executed run drops out of the Runs > Failures tab.

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant