-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TraceQL: Fix lookups .attr != nil with special columns #4864
base: main
Are you sure you want to change the base?
Conversation
@@ -1981,6 +1981,9 @@ func createSpanIterator(makeIter makeIterFn, innerIterators []parquetquery.Itera | |||
if cond.Op == traceql.OpNone { | |||
addPredicate(entry.columnPath, nil) // No filtering | |||
columnSelectAs[entry.columnPath] = cond.Attribute.Name | |||
|
|||
// Look up generic columns as there may be attributes with the same name but of different types | |||
genericConditions = append(genericConditions, cond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the conditions where OpNone
is passed? { span.foo != nil }
{span.foo == span.bar}
, others ?
Also, the attribute conditions are joined in differently depending on whether allConditions is true or not. Does this work in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the conditions where
OpNone
is passed?{ span.foo != nil }
{span.foo == span.bar}
, others ?
The only other one is that comes to my mind is { span.foo }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the attribute conditions are joined in differently depending on whether allConditions is true or not. Does this work in all cases?
Good call. Queries with allConditions=true
were actually not working correctly. I now added a commit that disables allConditions
when this kind of lookup occurs. Now it works. Do you think this is OK, or should we try to keep the allConditions
optimization for this edge case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
others that came to mind:
{} | select(span.foo) // i think?
{ span.foo + 3 > 2 } // interesting that we could solve expressions like this for span.foo for increased perf
This change is quite similar to this one: #4391
Also Grafana has requested that we consider an "all types" mode where we attempt to coalesce data types for all comparisons in a query. This is a growing class of problems that we need a way to solve efficiently.
Do you think this is OK, or should we try to keep the allConditions optimization for this edge case?
I'm surprised our tests didn't catch that bug. Can you run some benches to get a sense of the regression? I think that will help us decide if it's worth optimizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised our tests didn't catch that bug
I think two things happened here:
- We didn't have good test cases for existence checks in
TestBackendBlockSearchTraceQL
(I now added them here) - In the new test
TestBackendBlockSearchTraceQLNilSpecialColumns
I focused on the actual edge case where we have a special column and a colliding attribute with the same name in the generic attrs, but didn't check for cases without this kind of collision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the 40x increase in allocs looks wild. I'll have a closer look at the benchmark 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a good explanation for the 40x increase: the test case with the increase has the following query:
{ span.http.status_code.type = "string" && span.http.status_code != nil }
The test data I generated contains spans where http.status_code
has string values. The above query filters those spans and then looks for the existence of of http.status_code
, which matches all those strings. When benchmarking the same query on main, the iterators don't look at the generic attribute column at all and therefore the query does not match any spans.
In short: there were more allocations because there were much more matches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran a second round of benchmarks with the same data but replaced the following test cases:
{"spanWellKnownAndNotNilMatch", `{ span.net.host.name = "auth-service.local" && span.http.status_code != nil }`},
{"spanWellKnownOrNotNilMatch", `{ span.net.host.name = "auth-service.local" || span.http.status_code != nil }`},
Now the benchmark on the main branch also finds spans with span.http.status_code
. Running the benchmark on this branch still matches more spans within the generic columns. Therefore the benchmarks still show an increase in allocations and sec/op, but on a much smaller scale:
Benchmark results
❯ benchstat bench-nil-special-cols-02-main.txt bench-nil-special-cols-02-fix.txt
goos: linux
goarch: amd64
pkg: github.com/grafana/tempo/tempodb/encoding/vparquet4
cpu: AMD Ryzen 7 PRO 6850U with Radeon Graphics
│ bench-nil-special-cols-02-main.txt │ bench-nil-special-cols-02-fix.txt │
│ sec/op │ sec/op vs base │
BackendBlockTraceQL/spanAttrValMatch-16 684.0m ± 1% 670.4m ± 1% -1.98% (p=0.000 n=8)
BackendBlockTraceQL/spanAttrValNoMatch-16 10.07m ± 1% 10.16m ± 2% ~ (p=0.083 n=8)
BackendBlockTraceQL/spanIntrinsicMatch-16 336.7m ± 1% 337.2m ± 2% ~ (p=0.959 n=8)
BackendBlockTraceQL/spanIntrinsicNoMatch-16 1.208m ± 2% 1.227m ± 1% ~ (p=0.130 n=8)
BackendBlockTraceQL/spanAttrAndMatch-16 718.1m ± 1% 715.8m ± 1% ~ (p=0.234 n=8)
BackendBlockTraceQL/spanAttrAndNoMatch-16 1.226m ± 2% 1.208m ± 1% -1.44% (p=0.000 n=8)
BackendBlockTraceQL/spanAttrOrMatch-16 1.033 ± 1% 1.038 ± 2% ~ (p=0.328 n=8)
BackendBlockTraceQL/spanAttrOrNoMatch-16 847.4m ± 0% 844.5m ± 1% ~ (p=0.065 n=8)
BackendBlockTraceQL/spanAttrNotNilMatch-16 2.536 ± 1% 2.535 ± 1% ~ (p=0.574 n=8)
BackendBlockTraceQL/spanAttrNotNilNoMatch-16 2.106m ± 2% 2.070m ± 1% ~ (p=0.083 n=8)
BackendBlockTraceQL/spanWellKnownNotNilMatch-16 1.994 ± 1% 3.216 ± 1% +61.24% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedNotNilMatch-16 1.997 ± 1% 3.205 ± 1% +60.49% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownAndNotNilMatch-16 1.271 ± 1% 3.050 ± 1% +139.99% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownOrNotNilMatch-16 2.894 ± 1% 3.782 ± 1% +30.67% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedAndNotNilMatch-16 726.1m ± 1% 1566.0m ± 1% +115.66% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedOrNotNilMatch-16 1.841 ± 0% 2.726 ± 1% +48.06% (p=0.000 n=8)
geomean 249.5m 305.3m +22.34%
│ bench-nil-special-cols-02-main.txt │ bench-nil-special-cols-02-fix.txt │
│ B/s │ B/s vs base │
BackendBlockTraceQL/spanAttrValMatch-16 115.4Mi ± 0% 117.7Mi ± 1% +2.02% (p=0.000 n=8)
BackendBlockTraceQL/spanAttrValNoMatch-16 786.4Mi ± 1% 779.6Mi ± 6% ~ (p=0.083 n=8)
BackendBlockTraceQL/spanIntrinsicMatch-16 84.55Mi ± 1% 84.43Mi ± 2% ~ (p=0.959 n=8)
BackendBlockTraceQL/spanIntrinsicNoMatch-16 983.9Mi ± 3% 969.0Mi ± 1% ~ (p=0.130 n=8)
BackendBlockTraceQL/spanAttrAndMatch-16 110.8Mi ± 1% 111.1Mi ± 1% ~ (p=0.234 n=8)
BackendBlockTraceQL/spanAttrAndNoMatch-16 584.6Mi ± 2% 593.1Mi ± 1% +1.46% (p=0.000 n=8)
BackendBlockTraceQL/spanAttrOrMatch-16 77.30Mi ± 2% 76.89Mi ± 3% ~ (p=0.328 n=8)
BackendBlockTraceQL/spanAttrOrNoMatch-16 94.19Mi ± 1% 94.51Mi ± 1% ~ (p=0.065 n=8)
BackendBlockTraceQL/spanAttrNotNilMatch-16 39.84Mi ± 1% 39.85Mi ± 1% ~ (p=0.556 n=8)
BackendBlockTraceQL/spanAttrNotNilNoMatch-16 986.7Mi ± 2% 1004.0Mi ± 3% ~ (p=0.083 n=8)
BackendBlockTraceQL/spanWellKnownNotNilMatch-16 14.46Mi ± 1% 31.95Mi ± 1% +121.01% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedNotNilMatch-16 14.43Mi ± 1% 32.06Mi ± 1% +122.10% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownAndNotNilMatch-16 63.24Mi ± 1% 33.69Mi ± 1% -46.73% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownOrNotNilMatch-16 27.87Mi ± 1% 27.17Mi ± 1% -2.50% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedAndNotNilMatch-16 109.57Mi ± 1% 65.09Mi ± 3% -40.59% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedOrNotNilMatch-16 43.35Mi ± 2% 37.39Mi ± 1% -13.75% (p=0.000 n=8)
geomean 105.7Mi 107.7Mi +1.86%
│ bench-nil-special-cols-02-main.txt │ bench-nil-special-cols-02-fix.txt │
│ MB_io/op │ MB_io/op vs base │
BackendBlockTraceQL/spanAttrValMatch-16 82.75 ± 0% 82.75 ± 0% ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrValNoMatch-16 8.308 ± 0% 8.308 ± 0% ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanIntrinsicMatch-16 29.85 ± 0% 29.85 ± 0% ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanIntrinsicNoMatch-16 1.246 ± 0% 1.246 ± 0% ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrAndMatch-16 83.42 ± 0% 83.42 ± 0% ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrAndNoMatch-16 751.3m ± 0% 751.3m ± 0% ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrOrMatch-16 83.69 ± 0% 83.69 ± 0% ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrOrNoMatch-16 83.69 ± 0% 83.69 ± 0% ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrNotNilMatch-16 105.9 ± 0% 105.9 ± 0% ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrNotNilNoMatch-16 2.179 ± 0% 2.179 ± 0% ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanWellKnownNotNilMatch-16 30.23 ± 0% 107.80 ± 0% +256.60% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedNotNilMatch-16 30.23 ± 0% 107.80 ± 0% +256.60% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownAndNotNilMatch-16 84.29 ± 0% 107.80 ± 0% +27.89% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownOrNotNilMatch-16 84.57 ± 0% 107.80 ± 0% +27.47% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedAndNotNilMatch-16 83.42 ± 0% 106.90 ± 0% +28.15% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedOrNotNilMatch-16 83.69 ± 0% 106.90 ± 0% +27.73% (p=0.000 n=8)
geomean 27.67 34.48 +24.64%
¹ all samples are equal
│ bench-nil-special-cols-02-main.txt │ bench-nil-special-cols-02-fix.txt │
│ B/op │ B/op vs base │
BackendBlockTraceQL/spanAttrValMatch-16 154.6Mi ± 1% 154.5Mi ± 1% ~ (p=0.878 n=8)
BackendBlockTraceQL/spanAttrValNoMatch-16 8.555Mi ± 2% 8.555Mi ± 4% ~ (p=0.878 n=8)
BackendBlockTraceQL/spanIntrinsicMatch-16 134.5Mi ± 1% 133.9Mi ± 0% -0.49% (p=0.050 n=8)
BackendBlockTraceQL/spanIntrinsicNoMatch-16 876.5Ki ± 5% 883.1Ki ± 4% ~ (p=0.382 n=8)
BackendBlockTraceQL/spanAttrAndMatch-16 156.9Mi ± 3% 156.1Mi ± 3% ~ (p=0.328 n=8)
BackendBlockTraceQL/spanAttrAndNoMatch-16 818.5Ki ± 3% 826.0Ki ± 2% ~ (p=0.505 n=8)
BackendBlockTraceQL/spanAttrOrMatch-16 285.4Mi ± 1% 285.4Mi ± 1% ~ (p=0.878 n=8)
BackendBlockTraceQL/spanAttrOrNoMatch-16 155.6Mi ± 1% 155.7Mi ± 1% ~ (p=0.798 n=8)
BackendBlockTraceQL/spanAttrNotNilMatch-16 948.1Mi ± 1% 948.3Mi ± 1% ~ (p=0.645 n=8)
BackendBlockTraceQL/spanAttrNotNilNoMatch-16 1.418Mi ± 3% 1.412Mi ± 2% ~ (p=0.878 n=8)
BackendBlockTraceQL/spanWellKnownNotNilMatch-16 1.380Gi ± 0% 1.573Gi ± 0% +14.03% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedNotNilMatch-16 1.380Gi ± 0% 1.571Gi ± 0% +13.85% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownAndNotNilMatch-16 428.6Mi ± 0% 1047.7Mi ± 0% +144.44% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownOrNotNilMatch-16 1.646Gi ± 0% 1.724Gi ± 0% +4.74% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedAndNotNilMatch-16 156.7Mi ± 1% 165.5Mi ± 2% +5.56% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedOrNotNilMatch-16 809.3Mi ± 0% 937.5Mi ± 0% +15.84% (p=0.000 n=8)
geomean 104.7Mi 114.3Mi +9.17%
│ bench-nil-special-cols-02-main.txt │ bench-nil-special-cols-02-fix.txt │
│ allocs/op │ allocs/op vs base │
BackendBlockTraceQL/spanAttrValMatch-16 1.699M ± 0% 1.699M ± 0% +0.00% (p=0.019 n=8)
BackendBlockTraceQL/spanAttrValNoMatch-16 10.97k ± 0% 10.97k ± 0% ~ (p=0.545 n=8)
BackendBlockTraceQL/spanIntrinsicMatch-16 1.826M ± 0% 1.826M ± 0% ~ (p=0.896 n=8)
BackendBlockTraceQL/spanIntrinsicNoMatch-16 10.81k ± 0% 10.81k ± 0% ~ (p=0.631 n=8)
BackendBlockTraceQL/spanAttrAndMatch-16 1.832M ± 0% 1.832M ± 0% ~ (p=0.700 n=8)
BackendBlockTraceQL/spanAttrAndNoMatch-16 10.92k ± 0% 10.92k ± 0% ~ (p=0.608 n=8)
BackendBlockTraceQL/spanAttrOrMatch-16 3.324M ± 0% 3.324M ± 0% ~ (p=0.798 n=8)
BackendBlockTraceQL/spanAttrOrNoMatch-16 1.722M ± 0% 1.722M ± 0% ~ (p=0.554 n=8)
BackendBlockTraceQL/spanAttrNotNilMatch-16 11.03M ± 0% 11.03M ± 0% ~ (p=0.940 n=8)
BackendBlockTraceQL/spanAttrNotNilNoMatch-16 10.91k ± 0% 10.91k ± 0% ~ (p=0.345 n=8)
BackendBlockTraceQL/spanWellKnownNotNilMatch-16 14.76M ± 0% 16.11M ± 0% +9.16% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedNotNilMatch-16 14.76M ± 0% 16.11M ± 0% +9.16% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownAndNotNilMatch-16 6.083M ± 0% 11.937M ± 0% +96.25% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownOrNotNilMatch-16 16.93M ± 0% 18.33M ± 0% +8.25% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedAndNotNilMatch-16 1.832M ± 0% 2.191M ± 0% +19.61% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedOrNotNilMatch-16 9.547M ± 0% 11.225M ± 0% +17.58% (p=0.000 n=8)
geomean 1.044M 1.130M +8.26%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this worth it? Should we wait for a new block format to fix this?
It's similar to the struggles we had in the int vs float comparisons. It's so much more costly to scan the main attributes column and you lose all the benefits of having the well known/dedicated column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joe-elliott Good point. I agree that scanning the generic attributes column is more costly and cancels out some of the benefits of well-known/dedicated columns. For most queries like { span.http.status_code != nil }
, the overhead isn't too large if there are no matching values in the generic attribute columns.
It still might make sense to shelve this for now. The new block format will likely provide a cleaner fix, so revisiting this later seems like the better approach.
What this PR does:
Improves attribute existence checks using
{ .attr != nil }
. Queries involving well-known attributes and attributes stored in dedicated columns currently don't check generic attribute columns for attributes with the same name but a different type.For example, the query
{ span.http.status_code != nil }
will not match spans wherehttp.status_code
exists as a string (e.g."200"
).Which issue(s) this PR fixes:
Fixes #2968
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]