Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TraceQL: Fix lookups .attr != nil with special columns #4864

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

stoewer
Copy link
Contributor

@stoewer stoewer commented Mar 17, 2025

What this PR does:
Improves attribute existence checks using { .attr != nil }. Queries involving well-known attributes and attributes stored in dedicated columns currently don't check generic attribute columns for attributes with the same name but a different type.

For example, the query { span.http.status_code != nil } will not match spans where http.status_code exists as a string (e.g. "200").

Which issue(s) this PR fixes:
Fixes #2968

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@stoewer stoewer changed the title Fix lookups .attr != nil with special columns TraceQL: Fix lookups .attr != nil with special columns Mar 17, 2025
@@ -1981,6 +1981,9 @@ func createSpanIterator(makeIter makeIterFn, innerIterators []parquetquery.Itera
if cond.Op == traceql.OpNone {
addPredicate(entry.columnPath, nil) // No filtering
columnSelectAs[entry.columnPath] = cond.Attribute.Name

// Look up generic columns as there may be attributes with the same name but of different types
genericConditions = append(genericConditions, cond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the conditions where OpNone is passed? { span.foo != nil } {span.foo == span.bar}, others ?

Also, the attribute conditions are joined in differently depending on whether allConditions is true or not. Does this work in all cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the conditions where OpNone is passed? { span.foo != nil } {span.foo == span.bar}, others ?

The only other one is that comes to my mind is { span.foo }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the attribute conditions are joined in differently depending on whether allConditions is true or not. Does this work in all cases?

Good call. Queries with allConditions=true were actually not working correctly. I now added a commit that disables allConditions when this kind of lookup occurs. Now it works. Do you think this is OK, or should we try to keep the allConditions optimization for this edge case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

others that came to mind:

{} | select(span.foo) // i think?
{ span.foo + 3 > 2 } // interesting that we could solve expressions like this for span.foo for increased perf

This change is quite similar to this one: #4391

Also Grafana has requested that we consider an "all types" mode where we attempt to coalesce data types for all comparisons in a query. This is a growing class of problems that we need a way to solve efficiently.

Do you think this is OK, or should we try to keep the allConditions optimization for this edge case?

I'm surprised our tests didn't catch that bug. Can you run some benches to get a sense of the regression? I think that will help us decide if it's worth optimizing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised our tests didn't catch that bug

I think two things happened here:

  • We didn't have good test cases for existence checks in TestBackendBlockSearchTraceQL (I now added them here)
  • In the new test TestBackendBlockSearchTraceQLNilSpecialColumns I focused on the actual edge case where we have a special column and a colliding attribute with the same name in the generic attrs, but didn't check for cases without this kind of collision

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the 40x increase in allocs looks wild. I'll have a closer look at the benchmark 👍

Copy link
Contributor Author

@stoewer stoewer Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a good explanation for the 40x increase: the test case with the increase has the following query:

{ span.http.status_code.type = "string" && span.http.status_code != nil }

The test data I generated contains spans where http.status_code has string values. The above query filters those spans and then looks for the existence of of http.status_code, which matches all those strings. When benchmarking the same query on main, the iterators don't look at the generic attribute column at all and therefore the query does not match any spans.

In short: there were more allocations because there were much more matches

Copy link
Contributor Author

@stoewer stoewer Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a second round of benchmarks with the same data but replaced the following test cases:

{"spanWellKnownAndNotNilMatch", `{ span.net.host.name = "auth-service.local" && span.http.status_code != nil }`},
{"spanWellKnownOrNotNilMatch", `{ span.net.host.name = "auth-service.local" || span.http.status_code != nil }`},

Now the benchmark on the main branch also finds spans with span.http.status_code. Running the benchmark on this branch still matches more spans within the generic columns. Therefore the benchmarks still show an increase in allocations and sec/op, but on a much smaller scale:

Benchmark results
❯ benchstat bench-nil-special-cols-02-main.txt bench-nil-special-cols-02-fix.txt
goos: linux
goarch: amd64
pkg: github.com/grafana/tempo/tempodb/encoding/vparquet4
cpu: AMD Ryzen 7 PRO 6850U with Radeon Graphics     
                                                   │ bench-nil-special-cols-02-main.txt │  bench-nil-special-cols-02-fix.txt   │
                                                   │               sec/op               │    sec/op     vs base                │
BackendBlockTraceQL/spanAttrValMatch-16                                     684.0m ± 1%    670.4m ± 1%    -1.98% (p=0.000 n=8)
BackendBlockTraceQL/spanAttrValNoMatch-16                                   10.07m ± 1%    10.16m ± 2%         ~ (p=0.083 n=8)
BackendBlockTraceQL/spanIntrinsicMatch-16                                   336.7m ± 1%    337.2m ± 2%         ~ (p=0.959 n=8)
BackendBlockTraceQL/spanIntrinsicNoMatch-16                                 1.208m ± 2%    1.227m ± 1%         ~ (p=0.130 n=8)
BackendBlockTraceQL/spanAttrAndMatch-16                                     718.1m ± 1%    715.8m ± 1%         ~ (p=0.234 n=8)
BackendBlockTraceQL/spanAttrAndNoMatch-16                                   1.226m ± 2%    1.208m ± 1%    -1.44% (p=0.000 n=8)
BackendBlockTraceQL/spanAttrOrMatch-16                                       1.033 ± 1%     1.038 ± 2%         ~ (p=0.328 n=8)
BackendBlockTraceQL/spanAttrOrNoMatch-16                                    847.4m ± 0%    844.5m ± 1%         ~ (p=0.065 n=8)
BackendBlockTraceQL/spanAttrNotNilMatch-16                                   2.536 ± 1%     2.535 ± 1%         ~ (p=0.574 n=8)
BackendBlockTraceQL/spanAttrNotNilNoMatch-16                                2.106m ± 2%    2.070m ± 1%         ~ (p=0.083 n=8)
BackendBlockTraceQL/spanWellKnownNotNilMatch-16                              1.994 ± 1%     3.216 ± 1%   +61.24% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedNotNilMatch-16                              1.997 ± 1%     3.205 ± 1%   +60.49% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownAndNotNilMatch-16                           1.271 ± 1%     3.050 ± 1%  +139.99% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownOrNotNilMatch-16                            2.894 ± 1%     3.782 ± 1%   +30.67% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedAndNotNilMatch-16                          726.1m ± 1%   1566.0m ± 1%  +115.66% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedOrNotNilMatch-16                            1.841 ± 0%     2.726 ± 1%   +48.06% (p=0.000 n=8)
geomean                                                                     249.5m         305.3m        +22.34%

                                                   │ bench-nil-special-cols-02-main.txt │   bench-nil-special-cols-02-fix.txt   │
                                                   │                B/s                 │      B/s       vs base                │
BackendBlockTraceQL/spanAttrValMatch-16                                    115.4Mi ± 0%    117.7Mi ± 1%    +2.02% (p=0.000 n=8)
BackendBlockTraceQL/spanAttrValNoMatch-16                                  786.4Mi ± 1%    779.6Mi ± 6%         ~ (p=0.083 n=8)
BackendBlockTraceQL/spanIntrinsicMatch-16                                  84.55Mi ± 1%    84.43Mi ± 2%         ~ (p=0.959 n=8)
BackendBlockTraceQL/spanIntrinsicNoMatch-16                                983.9Mi ± 3%    969.0Mi ± 1%         ~ (p=0.130 n=8)
BackendBlockTraceQL/spanAttrAndMatch-16                                    110.8Mi ± 1%    111.1Mi ± 1%         ~ (p=0.234 n=8)
BackendBlockTraceQL/spanAttrAndNoMatch-16                                  584.6Mi ± 2%    593.1Mi ± 1%    +1.46% (p=0.000 n=8)
BackendBlockTraceQL/spanAttrOrMatch-16                                     77.30Mi ± 2%    76.89Mi ± 3%         ~ (p=0.328 n=8)
BackendBlockTraceQL/spanAttrOrNoMatch-16                                   94.19Mi ± 1%    94.51Mi ± 1%         ~ (p=0.065 n=8)
BackendBlockTraceQL/spanAttrNotNilMatch-16                                 39.84Mi ± 1%    39.85Mi ± 1%         ~ (p=0.556 n=8)
BackendBlockTraceQL/spanAttrNotNilNoMatch-16                               986.7Mi ± 2%   1004.0Mi ± 3%         ~ (p=0.083 n=8)
BackendBlockTraceQL/spanWellKnownNotNilMatch-16                            14.46Mi ± 1%    31.95Mi ± 1%  +121.01% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedNotNilMatch-16                            14.43Mi ± 1%    32.06Mi ± 1%  +122.10% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownAndNotNilMatch-16                         63.24Mi ± 1%    33.69Mi ± 1%   -46.73% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownOrNotNilMatch-16                          27.87Mi ± 1%    27.17Mi ± 1%    -2.50% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedAndNotNilMatch-16                        109.57Mi ± 1%    65.09Mi ± 3%   -40.59% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedOrNotNilMatch-16                          43.35Mi ± 2%    37.39Mi ± 1%   -13.75% (p=0.000 n=8)
geomean                                                                    105.7Mi         107.7Mi         +1.86%

                                                   │ bench-nil-special-cols-02-main.txt │   bench-nil-special-cols-02-fix.txt   │
                                                   │              MB_io/op              │  MB_io/op    vs base                  │
BackendBlockTraceQL/spanAttrValMatch-16                                      82.75 ± 0%    82.75 ± 0%         ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrValNoMatch-16                                    8.308 ± 0%    8.308 ± 0%         ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanIntrinsicMatch-16                                    29.85 ± 0%    29.85 ± 0%         ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanIntrinsicNoMatch-16                                  1.246 ± 0%    1.246 ± 0%         ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrAndMatch-16                                      83.42 ± 0%    83.42 ± 0%         ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrAndNoMatch-16                                   751.3m ± 0%   751.3m ± 0%         ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrOrMatch-16                                       83.69 ± 0%    83.69 ± 0%         ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrOrNoMatch-16                                     83.69 ± 0%    83.69 ± 0%         ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrNotNilMatch-16                                   105.9 ± 0%    105.9 ± 0%         ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanAttrNotNilNoMatch-16                                 2.179 ± 0%    2.179 ± 0%         ~ (p=1.000 n=8) ¹
BackendBlockTraceQL/spanWellKnownNotNilMatch-16                              30.23 ± 0%   107.80 ± 0%  +256.60% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedNotNilMatch-16                              30.23 ± 0%   107.80 ± 0%  +256.60% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownAndNotNilMatch-16                           84.29 ± 0%   107.80 ± 0%   +27.89% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownOrNotNilMatch-16                            84.57 ± 0%   107.80 ± 0%   +27.47% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedAndNotNilMatch-16                           83.42 ± 0%   106.90 ± 0%   +28.15% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedOrNotNilMatch-16                            83.69 ± 0%   106.90 ± 0%   +27.73% (p=0.000 n=8)
geomean                                                                      27.67         34.48        +24.64%
¹ all samples are equal

                                                   │ bench-nil-special-cols-02-main.txt │   bench-nil-special-cols-02-fix.txt   │
                                                   │                B/op                │     B/op       vs base                │
BackendBlockTraceQL/spanAttrValMatch-16                                    154.6Mi ± 1%    154.5Mi ± 1%         ~ (p=0.878 n=8)
BackendBlockTraceQL/spanAttrValNoMatch-16                                  8.555Mi ± 2%    8.555Mi ± 4%         ~ (p=0.878 n=8)
BackendBlockTraceQL/spanIntrinsicMatch-16                                  134.5Mi ± 1%    133.9Mi ± 0%    -0.49% (p=0.050 n=8)
BackendBlockTraceQL/spanIntrinsicNoMatch-16                                876.5Ki ± 5%    883.1Ki ± 4%         ~ (p=0.382 n=8)
BackendBlockTraceQL/spanAttrAndMatch-16                                    156.9Mi ± 3%    156.1Mi ± 3%         ~ (p=0.328 n=8)
BackendBlockTraceQL/spanAttrAndNoMatch-16                                  818.5Ki ± 3%    826.0Ki ± 2%         ~ (p=0.505 n=8)
BackendBlockTraceQL/spanAttrOrMatch-16                                     285.4Mi ± 1%    285.4Mi ± 1%         ~ (p=0.878 n=8)
BackendBlockTraceQL/spanAttrOrNoMatch-16                                   155.6Mi ± 1%    155.7Mi ± 1%         ~ (p=0.798 n=8)
BackendBlockTraceQL/spanAttrNotNilMatch-16                                 948.1Mi ± 1%    948.3Mi ± 1%         ~ (p=0.645 n=8)
BackendBlockTraceQL/spanAttrNotNilNoMatch-16                               1.418Mi ± 3%    1.412Mi ± 2%         ~ (p=0.878 n=8)
BackendBlockTraceQL/spanWellKnownNotNilMatch-16                            1.380Gi ± 0%    1.573Gi ± 0%   +14.03% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedNotNilMatch-16                            1.380Gi ± 0%    1.571Gi ± 0%   +13.85% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownAndNotNilMatch-16                         428.6Mi ± 0%   1047.7Mi ± 0%  +144.44% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownOrNotNilMatch-16                          1.646Gi ± 0%    1.724Gi ± 0%    +4.74% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedAndNotNilMatch-16                         156.7Mi ± 1%    165.5Mi ± 2%    +5.56% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedOrNotNilMatch-16                          809.3Mi ± 0%    937.5Mi ± 0%   +15.84% (p=0.000 n=8)
geomean                                                                    104.7Mi         114.3Mi         +9.17%

                                                   │ bench-nil-special-cols-02-main.txt │  bench-nil-special-cols-02-fix.txt  │
                                                   │             allocs/op              │  allocs/op    vs base               │
BackendBlockTraceQL/spanAttrValMatch-16                                     1.699M ± 0%    1.699M ± 0%   +0.00% (p=0.019 n=8)
BackendBlockTraceQL/spanAttrValNoMatch-16                                   10.97k ± 0%    10.97k ± 0%        ~ (p=0.545 n=8)
BackendBlockTraceQL/spanIntrinsicMatch-16                                   1.826M ± 0%    1.826M ± 0%        ~ (p=0.896 n=8)
BackendBlockTraceQL/spanIntrinsicNoMatch-16                                 10.81k ± 0%    10.81k ± 0%        ~ (p=0.631 n=8)
BackendBlockTraceQL/spanAttrAndMatch-16                                     1.832M ± 0%    1.832M ± 0%        ~ (p=0.700 n=8)
BackendBlockTraceQL/spanAttrAndNoMatch-16                                   10.92k ± 0%    10.92k ± 0%        ~ (p=0.608 n=8)
BackendBlockTraceQL/spanAttrOrMatch-16                                      3.324M ± 0%    3.324M ± 0%        ~ (p=0.798 n=8)
BackendBlockTraceQL/spanAttrOrNoMatch-16                                    1.722M ± 0%    1.722M ± 0%        ~ (p=0.554 n=8)
BackendBlockTraceQL/spanAttrNotNilMatch-16                                  11.03M ± 0%    11.03M ± 0%        ~ (p=0.940 n=8)
BackendBlockTraceQL/spanAttrNotNilNoMatch-16                                10.91k ± 0%    10.91k ± 0%        ~ (p=0.345 n=8)
BackendBlockTraceQL/spanWellKnownNotNilMatch-16                             14.76M ± 0%    16.11M ± 0%   +9.16% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedNotNilMatch-16                             14.76M ± 0%    16.11M ± 0%   +9.16% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownAndNotNilMatch-16                          6.083M ± 0%   11.937M ± 0%  +96.25% (p=0.000 n=8)
BackendBlockTraceQL/spanWellKnownOrNotNilMatch-16                           16.93M ± 0%    18.33M ± 0%   +8.25% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedAndNotNilMatch-16                          1.832M ± 0%    2.191M ± 0%  +19.61% (p=0.000 n=8)
BackendBlockTraceQL/spanDedicatedOrNotNilMatch-16                           9.547M ± 0%   11.225M ± 0%  +17.58% (p=0.000 n=8)
geomean                                                                     1.044M         1.130M        +8.26%

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this worth it? Should we wait for a new block format to fix this?

It's similar to the struggles we had in the int vs float comparisons. It's so much more costly to scan the main attributes column and you lose all the benefits of having the well known/dedicated column.

Copy link
Contributor Author

@stoewer stoewer Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joe-elliott Good point. I agree that scanning the generic attributes column is more costly and cancels out some of the benefits of well-known/dedicated columns. For most queries like { span.http.status_code != nil }, the overhead isn't too large if there are no matching values in the generic attribute columns.

It still might make sense to shelve this for now. The new block format will likely provide a cleaner fix, so revisiting this later seems like the better approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

!= nil only supports special columns
2 participants