fixing thanos querier dedup issue causing incorrect/high values when … #8085
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanos querier rate/increase function creating huge spikes/incorrect results when deduplication is enabled
Changes
Changes are made in overlapSplitSet Next function to filter out the prometheus replica/instance counter metric value whenever lower timestamps has higher values. The issue is explained in detail in #7623
Most of the chunks from different replicas are merged and few leftover separate chunks are returned from loser tree and the separate non merged chunks when fed to dedup.NewSeriesSet in querier.go doesn't have any effect. Hence the changes are made in dedup.NewOverlapSplit to filter out samples which has higher values at lower timestamps
Note: Current dedup bug was resulting in many false alerts breaching the treshold and hence it is very important to fix the dedup bug ASAP
Verification
Tests are being performed. Before this change rate function was returning result of 1200 for below samples. When the fix is applied rate function only result in 0.46 which is accurate
Dedup issue
304528 1731358720.447
304530 1731358725.97
304532 1731358750.447
304536 1731358780.447
304540 1731358810.447
304543 1731358816.021 -- This sample has been filtered after fix
304542 1731358816.028
replica 0
304531 1731358726.028
304535 1731358756.028
304539 1731358786.028
304542 1731358816.028
replica 1
304530 1731358725.97
304534 1731358755.97
304538 1731358785.97
304543 1731358816.021
replica 2
304528 1731358720.447
304532 1731358750.447
304536 1731358780.447
304540 1731358810.447
Please review the changes