You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As an operator of a Dagster deployment, I cannot find a decent way to ensure that all of my failed runs have been addressed. This is particularly challenging in the case where there are many failures, e.g., if a database was down, or a bug was deployed.
One can clearly see that there are a bunch of failures in the Runs > Failures tab. But, after having fixed the underlying issue that caused the failures, there is no way to reliably re-execute all of the failures (you can only re-execute the 30 per page as I see it). And after restarting the runs, there is no way to show that the original failed run was re-executed, since at that point the concern is now on the re-executed run.
Ideas of implementation
One way to improve this could be to have an additional status for re-executed runs, so when you re-execute a failed run, then status changes from Failure to something like Re-executed, and then the Re-executed run drops out of the Runs > Failures tab.
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered:
What's the use case?
As an operator of a Dagster deployment, I cannot find a decent way to ensure that all of my failed runs have been addressed. This is particularly challenging in the case where there are many failures, e.g., if a database was down, or a bug was deployed.
One can clearly see that there are a bunch of failures in the Runs > Failures tab. But, after having fixed the underlying issue that caused the failures, there is no way to reliably re-execute all of the failures (you can only re-execute the 30 per page as I see it). And after restarting the runs, there is no way to show that the original failed run was re-executed, since at that point the concern is now on the re-executed run.
Ideas of implementation
One way to improve this could be to have an additional status for re-executed runs, so when you re-execute a failed run, then status changes from
Failure
to something likeRe-executed
, and then theRe-executed
run drops out of the Runs > Failures tab.Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered: