Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Aggregate metric per stage is missing filter for stage attempts #1552

Open
sayedbilalbari opened this issue Feb 19, 2025 · 0 comments · May be fixed by #1558
Open

[BUG] Aggregate metric per stage is missing filter for stage attempts #1552

sayedbilalbari opened this issue Feb 19, 2025 · 0 comments · May be fixed by #1558
Assignees
Labels
bug Something isn't working core_tools Scope the core module (scala)

Comments

@sayedbilalbari
Copy link
Collaborator

sayedbilalbari commented Feb 19, 2025

Currently aggregateDiagnosticMetricByStage and aggregateSparkMetricsByStageInternal use the getAllStages method of the stageModelManager which returns all the stages ( failed, successful, incomplete etc.).

and

This could lead to incorrect aggregation since different stageAttempts can override each other. The behavior here is non-deterministic as we are not sure which attempt will end up being last.

Solution

We want to enforce behaviour that just counts for the successful attempts of a stage as we are already dumping the failed stages in a separate report

  • When aggregating metrics, we should make sure that we do not mix-and-match-between different attempts.
  • This can be done by only picking the attempts that have not failed. Same applies for incomplete attempts since those ones can override each other as well.
  • Another alternative is to aggregate per stage attempt but this might not be ideal because failed stages do not have associated accumulables in the eventlog.
@sayedbilalbari sayedbilalbari added ? - Needs Triage bug Something isn't working labels Feb 19, 2025
@sayedbilalbari sayedbilalbari changed the title [BUG] aggregateDiagnosticMetricByStage and aggregateSparkMetricsByStageInternal output do not contain filter for failed stages [BUG] aggregateDiagnosticMetricByStage and aggregateSparkMetricsByStageInternal - missing filter for failed stages Feb 19, 2025
@amahussein amahussein added the core_tools Scope the core module (scala) label Feb 19, 2025
@amahussein amahussein changed the title [BUG] aggregateDiagnosticMetricByStage and aggregateSparkMetricsByStageInternal - missing filter for failed stages [BUG] Aggregate metric per stage is missing filter for stage attempts Feb 20, 2025
@sayedbilalbari sayedbilalbari self-assigned this Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
2 participants