Skip to content

POC: Sketch out cached filter result API #7513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented May 15, 2025

Draft until:

  • Pull out StringViewBuilder::concat_array into its own PR
  • Avoid double buffering of intermediate results
  • Add memory limit for results cache

Which issue does this PR close?

Rationale for this change

I am trying to sketch out enough of a cached filter result API to show performance improvements. Once I have done that, I will start proposing how to break it up into smaller PRs

What changes are included in this PR?

  1. Add code to cache columns which are reused in filter and scan

Are there any user-facing changes?

@github-actions github-actions bot added the parquet Changes to the parquet crate label May 15, 2025
@alamb alamb force-pushed the alamb/cache_filter_result branch from 78f96d1 to 31f2fa1 Compare May 15, 2025 19:39
@alamb alamb force-pushed the alamb/cache_filter_result branch from 31f2fa1 to 244e187 Compare May 15, 2025 20:33
filters: Vec<BooleanArray>,
}

impl CachedPredicateResultBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is very clear to get the cached result!

@alamb alamb force-pushed the alamb/cache_filter_result branch 2 times, most recently from 8961196 to 9e91e9f Compare May 16, 2025 12:48
/// TODO: potentially incrementally build the result of the predicate
/// evaluation without holding all the batches in memory. See
/// <https://github.com/apache/arrow-rs/issues/6692>
in_progress_arrays: Vec<Box<dyn InProgressArray>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb ,

Does it mean, this in_progress_arrays is not the final result for us to generate the final batch?

For example:
Predicate a > 1 => in_progress_array_a filtered by a > 1
Predicate b >2 => in_progress_array_b filtered by b > 2 also based filtered by a > 1, but we don't update the in_progress_array_a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent question

What I was thinking is that CachedPredicateResult would manage the "currently" applied predicate

So in the case where there are multiple predicates, I was thinking of a method like CachedPredicateResult::merge method which could take the result of filtering a and apply the result of filtering by b

We can then put heuristics / logic for if/when we materialize the filters into CachedPredicateResult

But that is sort of speculation at this point -- I don't have it all worked out yet

My plan is to get far enough to show this structure works and can improve performance, and then I'll work on the trickier logic of applying multiple filters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CachedPredicateResult::merge method which could take the result of filtering a and apply the result of filtering by b

Great idea!

But that is sort of speculation at this point -- I don't have it all worked out yet

Sure, i will continue to review, thank you @alamb !

@alamb alamb force-pushed the alamb/cache_filter_result branch from 9e91e9f to 147c7a7 Compare May 16, 2025 14:50
@alamb
Copy link
Contributor Author

alamb commented May 16, 2025

I tested this branch using a query that filters and selects the same column (NOTE it is critical to NOT use --all-features as all features turns on force_validate

cargo bench --features="arrow async" --bench arrow_reader_clickbench -- Q24

Here are the benchmark results (30ms --> 22ms) (25 % faster)

Gnuplot not found, using plotters backend
Looking for ClickBench files starting in current_dir and all parent directories: "/Users/andrewlamb/Software/arrow-rs/parquet"
arrow_reader_clickbench/sync/Q24
                        time:   [22.532 ms 22.604 ms 22.682 ms]
                        change: [-27.751% -27.245% -26.791%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

arrow_reader_clickbench/async/Q24
                        time:   [24.043 ms 24.171 ms 24.308 ms]
                        change: [-26.223% -25.697% -25.172%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

I realize this branch currently uses more memory (to buffer the filter results), but I think the additional memory growth can be limited with a setting.

@alamb alamb force-pushed the alamb/cache_filter_result branch from 147c7a7 to f1f7103 Compare May 16, 2025 15:08
@zhuqi-lucas
Copy link
Contributor

I tested this branch using a query that filters and selects the same column (NOTE it is critical to NOT use --all-features as all features turns on force_validate

cargo bench --features="arrow async" --bench arrow_reader_clickbench -- Q24

Here are the benchmark results (30ms --> 22ms) (25 % faster)

Gnuplot not found, using plotters backend
Looking for ClickBench files starting in current_dir and all parent directories: "/Users/andrewlamb/Software/arrow-rs/parquet"
arrow_reader_clickbench/sync/Q24
                        time:   [22.532 ms 22.604 ms 22.682 ms]
                        change: [-27.751% -27.245% -26.791%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

arrow_reader_clickbench/async/Q24
                        time:   [24.043 ms 24.171 ms 24.308 ms]
                        change: [-26.223% -25.697% -25.172%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

I realize this branch currently uses more memory (to buffer the filter results), but I think the additional memory growth can be limited with a setting.

Amazing result , i think it will be the perfect way instead of page cache, because page caching will have cache missing, but this PR will always cache the result!

@alamb
Copy link
Contributor Author

alamb commented May 16, 2025

Amazing result , i think it will be the perfect way instead of page cache, because page caching will have cache missing, but this PR will always cache the result!

Thanks -- I think one potential problem is that the cached results may consume too much memory (but I will try and handle that shortly)

I think we should proceed with starting to merge some refactorings; I left some suggestions here:

@zhuqi-lucas
Copy link
Contributor

Amazing result , i think it will be the perfect way instead of page cache, because page caching will have cache missing, but this PR will always cache the result!

Thanks -- I think one potential problem is that the cached results may consume too much memory (but I will try and handle that shortly)

I think we should proceed with starting to merge some refactorings; I left some suggestions here:

It makes sense! Thank you @alamb.

@alamb
Copy link
Contributor Author

alamb commented May 16, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/cache_filter_result (f1f7103) to 1a5999a diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_cache_filter_result
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented May 16, 2025

🤖: Benchmark completed

Details

group                                alamb_cache_filter_result              main
-----                                -------------------------              ----
arrow_reader_clickbench/async/Q1     1.00      2.0±0.03ms        ? ?/sec    1.15      2.4±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     12.9±0.06ms        ? ?/sec    1.08     13.9±0.11ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     14.9±0.16ms        ? ?/sec    1.06     15.8±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     24.4±0.26ms        ? ?/sec    1.59     38.8±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     37.6±0.33ms        ? ?/sec    1.39     52.3±0.32ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     35.5±0.24ms        ? ?/sec    1.41     50.0±0.37ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.01      5.1±0.05ms        ? ?/sec    1.00      5.0±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    114.6±0.51ms        ? ?/sec    1.42    162.8±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    132.0±0.61ms        ? ?/sec    1.59    209.4±0.69ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    200.4±0.94ms        ? ?/sec    2.12    425.7±1.52ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.00   414.9±12.61ms        ? ?/sec    1.18   491.6±11.23ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     41.9±0.46ms        ? ?/sec    1.38     57.7±0.51ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    105.5±0.37ms        ? ?/sec    1.58    166.9±1.13ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00    103.1±0.51ms        ? ?/sec    1.59    164.1±0.89ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     64.2±0.58ms        ? ?/sec    1.00     64.2±0.51ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.38    234.5±1.60ms        ? ?/sec    1.00    169.6±0.96ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.58    162.6±0.64ms        ? ?/sec    1.00    102.6±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     38.9±0.26ms        ? ?/sec    1.00     39.1±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     48.5±0.26ms        ? ?/sec    1.00     48.6±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     47.8±0.32ms        ? ?/sec    1.11     53.2±0.48ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     40.0±0.30ms        ? ?/sec    1.00     39.9±0.42ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.3±0.07ms        ? ?/sec    1.00     14.4±0.18ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00  1848.0±17.20µs        ? ?/sec    1.19      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00     12.0±0.08ms        ? ?/sec    1.05     12.6±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     13.8±0.07ms        ? ?/sec    1.05     14.4±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     25.5±1.92ms        ? ?/sec    1.59     40.6±0.46ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     36.2±1.26ms        ? ?/sec    1.49     54.0±2.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     33.9±0.35ms        ? ?/sec    1.52     51.4±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.04      4.4±0.11ms        ? ?/sec    1.00      4.2±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    138.2±1.41ms        ? ?/sec    1.29    178.9±1.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    135.0±1.00ms        ? ?/sec    1.75    236.5±1.61ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    198.4±4.34ms        ? ?/sec    2.47    490.2±2.72ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00    375.7±8.92ms        ? ?/sec    1.16   433.9±10.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     38.5±0.44ms        ? ?/sec    1.42     54.8±0.67ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00     95.3±0.41ms        ? ?/sec    1.64    156.6±0.97ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00     93.2±0.54ms        ? ?/sec    1.65    153.8±0.74ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.01     62.3±0.39ms        ? ?/sec    1.00     61.8±0.38ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     5.27   835.6±11.42ms        ? ?/sec    1.00    158.6±0.82ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     5.89    561.1±3.23ms        ? ?/sec    1.00     95.2±0.51ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     31.6±0.24ms        ? ?/sec    1.00     31.7±0.32ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.01     35.0±0.32ms        ? ?/sec    1.00     34.7±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     44.2±0.28ms        ? ?/sec    1.12     49.3±0.33ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.01     37.1±0.29ms        ? ?/sec    1.00     36.8±0.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     13.6±0.06ms        ? ?/sec    1.00     13.5±0.06ms        ? ?/sec

@zhuqi-lucas
Copy link
Contributor

🤖: Benchmark completed

Details

group                                alamb_cache_filter_result              main
-----                                -------------------------              ----
arrow_reader_clickbench/async/Q1     1.00      2.0±0.03ms        ? ?/sec    1.15      2.4±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     12.9±0.06ms        ? ?/sec    1.08     13.9±0.11ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     14.9±0.16ms        ? ?/sec    1.06     15.8±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     24.4±0.26ms        ? ?/sec    1.59     38.8±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     37.6±0.33ms        ? ?/sec    1.39     52.3±0.32ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     35.5±0.24ms        ? ?/sec    1.41     50.0±0.37ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.01      5.1±0.05ms        ? ?/sec    1.00      5.0±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    114.6±0.51ms        ? ?/sec    1.42    162.8±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    132.0±0.61ms        ? ?/sec    1.59    209.4±0.69ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    200.4±0.94ms        ? ?/sec    2.12    425.7±1.52ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.00   414.9±12.61ms        ? ?/sec    1.18   491.6±11.23ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     41.9±0.46ms        ? ?/sec    1.38     57.7±0.51ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    105.5±0.37ms        ? ?/sec    1.58    166.9±1.13ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00    103.1±0.51ms        ? ?/sec    1.59    164.1±0.89ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     64.2±0.58ms        ? ?/sec    1.00     64.2±0.51ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.38    234.5±1.60ms        ? ?/sec    1.00    169.6±0.96ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.58    162.6±0.64ms        ? ?/sec    1.00    102.6±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     38.9±0.26ms        ? ?/sec    1.00     39.1±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     48.5±0.26ms        ? ?/sec    1.00     48.6±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     47.8±0.32ms        ? ?/sec    1.11     53.2±0.48ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     40.0±0.30ms        ? ?/sec    1.00     39.9±0.42ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.3±0.07ms        ? ?/sec    1.00     14.4±0.18ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00  1848.0±17.20µs        ? ?/sec    1.19      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00     12.0±0.08ms        ? ?/sec    1.05     12.6±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     13.8±0.07ms        ? ?/sec    1.05     14.4±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     25.5±1.92ms        ? ?/sec    1.59     40.6±0.46ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     36.2±1.26ms        ? ?/sec    1.49     54.0±2.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     33.9±0.35ms        ? ?/sec    1.52     51.4±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.04      4.4±0.11ms        ? ?/sec    1.00      4.2±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    138.2±1.41ms        ? ?/sec    1.29    178.9±1.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    135.0±1.00ms        ? ?/sec    1.75    236.5±1.61ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    198.4±4.34ms        ? ?/sec    2.47    490.2±2.72ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00    375.7±8.92ms        ? ?/sec    1.16   433.9±10.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     38.5±0.44ms        ? ?/sec    1.42     54.8±0.67ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00     95.3±0.41ms        ? ?/sec    1.64    156.6±0.97ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00     93.2±0.54ms        ? ?/sec    1.65    153.8±0.74ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.01     62.3±0.39ms        ? ?/sec    1.00     61.8±0.38ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     5.27   835.6±11.42ms        ? ?/sec    1.00    158.6±0.82ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     5.89    561.1±3.23ms        ? ?/sec    1.00     95.2±0.51ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     31.6±0.24ms        ? ?/sec    1.00     31.7±0.32ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.01     35.0±0.32ms        ? ?/sec    1.00     34.7±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     44.2±0.28ms        ? ?/sec    1.12     49.3±0.33ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.01     37.1±0.29ms        ? ?/sec    1.00     36.8±0.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     13.6±0.06ms        ? ?/sec    1.00     13.5±0.06ms        ? ?/sec

It seems regression for Q36/Q37.

@alamb
Copy link
Contributor Author

alamb commented May 20, 2025

It seems regression for Q36/Q37.

Yes, I agree -- I will figure out why

@alamb alamb force-pushed the alamb/cache_filter_result branch from f1f7103 to a0e4b29 Compare May 20, 2025 17:12
@alamb
Copy link
Contributor Author

alamb commented May 20, 2025

It seems regression for Q36/Q37.

Yes, I agree -- I will figure out why

I did some profiling:

samply record target/release/deps/arrow_reader_clickbench-aef15514767c9665 --bench arrow_reader_clickbench/sync/Q36

Basically, the issue is that calling slice() is taking a non trivial amount of the time for Q36

Screenshot 2025-05-20 at 1 23 25 PM

I added some printlns and it seems like we have 181k rows in total that pass but the number of buffers is crazy (I think this is related to concat not compacting the ByteViewArray). Working on this...

ByteViewArray::slice offset=8192 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=16384 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=24576 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=32768 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=40960 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=49152 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=57344 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=65536 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=73728 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=81920 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=90112 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=98304 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=106496 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=114688 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=122880 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=131072 length=8192, total_rows: 181198 buffer_count: 542225
ByteViewArray::slice offset=139264 length=8192, total_rows: 181198 buffer_count: 542225

@alamb
Copy link
Contributor Author

alamb commented May 20, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/cache_filter_result (c0c3eb4) to 45bda04 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_cache_filter_result
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented May 20, 2025

🤖: Benchmark completed

Details

group                                alamb_cache_filter_result              main
-----                                -------------------------              ----
arrow_reader_clickbench/async/Q1     1.00      2.0±0.01ms        ? ?/sec    1.16      2.4±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     14.2±0.16ms        ? ?/sec    1.03     14.7±0.13ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     16.1±0.14ms        ? ?/sec    1.03     16.5±0.18ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     27.4±0.33ms        ? ?/sec    1.39     38.0±0.30ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     39.9±0.33ms        ? ?/sec    1.29     51.6±0.41ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     38.3±0.34ms        ? ?/sec    1.30     49.7±0.31ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.01      5.2±0.07ms        ? ?/sec    1.00      5.1±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    114.5±0.73ms        ? ?/sec    1.38    158.5±0.67ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    131.5±0.79ms        ? ?/sec    1.68    220.4±1.03ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    234.3±8.23ms        ? ?/sec    2.07    486.1±2.04ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.00   440.6±13.11ms        ? ?/sec    1.11   489.2±17.69ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     45.0±0.37ms        ? ?/sec    1.29     58.1±0.59ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    119.0±0.58ms        ? ?/sec    1.36    161.5±0.80ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00    115.4±0.73ms        ? ?/sec    1.39    160.0±0.95ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.01     65.7±0.48ms        ? ?/sec    1.00     64.8±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    129.4±0.83ms        ? ?/sec    1.29    167.2±0.84ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00     99.2±0.68ms        ? ?/sec    1.00     98.9±0.53ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.01     39.9±0.27ms        ? ?/sec    1.00     39.5±0.30ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.01     49.4±0.40ms        ? ?/sec    1.00     49.0±0.38ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     49.1±0.66ms        ? ?/sec    1.09     53.5±0.43ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     41.2±0.47ms        ? ?/sec    1.00     41.0±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.7±0.18ms        ? ?/sec    1.00     14.6±0.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00   1843.8±9.23µs        ? ?/sec    1.20      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00     13.0±0.07ms        ? ?/sec    1.03     13.3±0.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     14.8±0.11ms        ? ?/sec    1.02     15.2±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     32.4±0.50ms        ? ?/sec    1.25     40.6±0.31ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     44.3±0.42ms        ? ?/sec    1.21     53.7±0.46ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     42.9±0.51ms        ? ?/sec    1.22     52.3±0.46ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.02      4.4±0.02ms        ? ?/sec    1.00      4.3±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00   121.8±10.69ms        ? ?/sec    1.44    175.5±1.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    137.2±9.68ms        ? ?/sec    1.70    233.1±1.71ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    214.2±9.00ms        ? ?/sec    2.22    475.1±3.54ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00   383.2±15.35ms        ? ?/sec    1.16   442.7±15.50ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     41.7±0.48ms        ? ?/sec    1.31     54.5±0.58ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.13   172.6±10.81ms        ? ?/sec    1.00    152.3±1.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.06    158.6±6.71ms        ? ?/sec    1.00    150.2±0.76ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.03     64.3±0.70ms        ? ?/sec    1.00     62.5±0.48ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    119.8±0.89ms        ? ?/sec    1.31    157.5±0.88ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.01     93.6±0.71ms        ? ?/sec    1.00     92.3±0.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.02     32.3±0.25ms        ? ?/sec    1.00     31.7±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.02     35.1±0.39ms        ? ?/sec    1.00     34.3±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     45.5±0.48ms        ? ?/sec    1.11     50.5±0.37ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.01     38.2±0.32ms        ? ?/sec    1.00     37.9±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.01     13.7±0.07ms        ? ?/sec    1.00     13.6±0.06ms        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented May 20, 2025

🤖: Benchmark completed

Well, that is looking quite a bit better :bowtie:

I am now working on a way to reduce buffering requirements (will require incremental concat'ing)

@zhuqi-lucas
Copy link
Contributor

🤖: Benchmark completed

Well, that is looking quite a bit better :bowtie:

I am now working on a way to reduce buffering requirements (will require incremental concat'ing)

Amazing result @alamb , it looks pretty cool!

@github-actions github-actions bot added the arrow Changes to the arrow crate label May 22, 2025
@alamb alamb force-pushed the alamb/cache_filter_result branch 2 times, most recently from 0d358f2 to 5be48ac Compare May 27, 2025 16:05
@alamb
Copy link
Contributor Author

alamb commented May 27, 2025

Ok, I reworked a bunch of the code in this PR so it is now structured to use a IncrementalRecordBatchBuilder which builds output in batch_size batches from filtered results-- in and of itself this doesn't improve performance, but I am finally set up now to add a way to incrementally build up Arrays from filters.

I will continue working on this tomorrow. Now I need to go do other things and reviews, etc

@alamb alamb force-pushed the alamb/cache_filter_result branch 2 times, most recently from 2dd6bf2 to 8e3737a Compare May 28, 2025 13:10
@alamb alamb force-pushed the alamb/cache_filter_result branch from 8e3737a to e93da91 Compare May 28, 2025 13:13
@alamb
Copy link
Contributor Author

alamb commented May 28, 2025

Status update

Current State of this PR

  1. Caches the results of the most recent filter which is applied during parquet decode
  2. Contains an initial implementation of ArrayBuilderExtFilter and ArrayBuilderExtConcat which permit incrementally building arrays without materializing the intermediate results (prototype API from Optimize take/filter/concat from multiple input arrays to a single large output array #6692)
  3. Contains IncrementalRecordBatchBuilder that incrementally builds record batches from filtered results.

The use of the incremental builders saves at least one memory copy during filtering and reduces the buffering required (which also might increase speed). It will also reduce the times we have to rewrite StringView which will help

Next Steps

I next plan to:

  1. Run arrow-rs benchmarks to show it helping
  2. Do a POC in DataFusion using the IncrementalRecordBatchBuilder in FilterExec to see if it makes a difference there

If those tests look good, I will begin breaking this PR up into smaller pieces for review

Major items I know are needed:

  1. Memory limiting for cached results in the parquet reader
  2. Updating previous cached results with subsequent filters
  3. Benchmarks showing the effect of using incremental filtering / append compared to filter and concat

@alamb
Copy link
Contributor Author

alamb commented May 28, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/cache_filter_result (76fcb56) to 0a4ffa5 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_cache_filter_result
Results will be posted here when complete

@zhuqi-lucas
Copy link
Contributor

Status update

Current State of this PR

  1. Caches the results of the most recent filter which is applied during parquet decode
  2. Contains an initial implementation of ArrayBuilderExtFilter and ArrayBuilderExtConcat which permit incrementally building arrays without materializing the intermediate results (prototype API from Optimize take/filter/concat from multiple input arrays to a single large output array #6692)
  3. Contains IncrementalRecordBatchBuilder that incrementally builds record batches from filtered results.

The use of the incremental builders saves at least one memory copy during filtering and reduces the buffering required (which also might increase speed). It will also reduce the times we have to rewrite StringView which will help

Next Steps

I next plan to:

  1. Run arrow-rs benchmarks to show it helping
  2. Do a POC in DataFusion using the IncrementalRecordBatchBuilder in FilterExec to see if it makes a difference there

If those tests look good, I will begin breaking this PR up into smaller pieces for review

Major items I know are needed:

  1. Memory limiting for cached results in the parquet reader
  2. Updating previous cached results with subsequent filters
  3. Benchmarks showing the effect of using incremental filtering / append compared to filter and concat

Great work, thank you @alamb , i will study and review the details code tomorrow!

@alamb
Copy link
Contributor Author

alamb commented May 28, 2025

🤖: Benchmark completed

Details

group                                alamb_cache_filter_result              main
-----                                -------------------------              ----
arrow_reader_clickbench/async/Q1     1.00  1994.2±19.90µs        ? ?/sec    1.18      2.4±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     13.6±0.12ms        ? ?/sec    1.08     14.7±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     15.4±0.11ms        ? ?/sec    1.08     16.6±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     25.5±0.29ms        ? ?/sec    1.53     39.1±0.23ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     38.1±0.40ms        ? ?/sec    1.38     52.5±0.49ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     36.3±0.24ms        ? ?/sec    1.39     50.5±0.34ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.02      5.0±0.05ms        ? ?/sec    1.00      4.9±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    108.4±0.62ms        ? ?/sec    1.49    161.6±0.62ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    124.3±0.64ms        ? ?/sec    1.68    209.3±1.12ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    202.2±0.97ms        ? ?/sec    2.41    486.6±2.50ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.00   433.0±12.51ms        ? ?/sec    1.14   493.3±10.74ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     42.1±0.36ms        ? ?/sec    1.36     57.4±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    107.6±0.50ms        ? ?/sec    1.53    164.5±0.93ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00    107.2±0.48ms        ? ?/sec    1.52    162.6±1.03ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     64.0±0.44ms        ? ?/sec    1.01     64.5±0.32ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    118.9±0.68ms        ? ?/sec    1.43    170.2±2.12ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00     92.7±0.56ms        ? ?/sec    1.10    102.3±0.53ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     38.8±0.41ms        ? ?/sec    1.00     38.7±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     48.0±0.42ms        ? ?/sec    1.00     48.1±0.44ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     48.3±0.37ms        ? ?/sec    1.08     52.3±0.33ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.02     40.0±0.26ms        ? ?/sec    1.00     39.4±0.23ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.01     14.3±0.07ms        ? ?/sec    1.00     14.1±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00   1804.8±8.57µs        ? ?/sec    1.22      2.2±0.00ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00     12.4±0.07ms        ? ?/sec    1.09     13.5±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     14.2±0.06ms        ? ?/sec    1.08     15.4±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     24.4±0.73ms        ? ?/sec    1.69     41.3±0.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     35.4±0.37ms        ? ?/sec    1.53     54.3±0.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     34.3±0.33ms        ? ?/sec    1.54     52.7±0.31ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.02      4.3±0.01ms        ? ?/sec    1.00      4.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    113.0±1.50ms        ? ?/sec    1.58    179.0±0.70ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    124.7±3.12ms        ? ?/sec    1.90    237.4±2.48ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    172.4±2.06ms        ? ?/sec    2.83    487.4±2.89ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00   356.1±11.22ms        ? ?/sec    1.23   439.1±14.62ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     39.5±0.56ms        ? ?/sec    1.39     54.9±0.54ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    100.1±4.19ms        ? ?/sec    1.56    155.9±0.87ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00     96.3±0.48ms        ? ?/sec    1.59    152.9±1.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     61.8±0.45ms        ? ?/sec    1.02     62.9±0.37ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    108.0±0.96ms        ? ?/sec    1.48    159.3±1.17ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     87.6±0.70ms        ? ?/sec    1.08     95.0±0.41ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     31.3±0.27ms        ? ?/sec    1.01     31.5±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     33.6±0.23ms        ? ?/sec    1.03     34.6±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     44.7±0.45ms        ? ?/sec    1.09     48.8±0.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.01     36.8±0.22ms        ? ?/sec    1.00     36.5±0.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.02     13.5±0.08ms        ? ?/sec    1.00     13.3±0.04ms        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented May 28, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/cache_filter_result (f2b2c1b) to 0a4ffa5 diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_cache_filter_result
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented May 28, 2025

🤖: Benchmark completed

Details

group                                                                         alamb_cache_filter_result              main
-----                                                                         -------------------------              ----
filter context decimal128 (kept 1/2)                                          1.73     71.1±1.27µs        ? ?/sec    1.00     41.2±3.77µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.00     50.5±0.62µs        ? ?/sec    1.03     51.9±1.35µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.42    366.6±0.46ns        ? ?/sec    1.00    257.9±0.28ns        ? ?/sec
filter context f32 (kept 1/2)                                                 2.09    145.2±0.24µs        ? ?/sec    1.00     69.6±0.08µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.38     18.8±0.54µs        ? ?/sec    1.00     13.6±0.54µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.72    781.9±8.58ns        ? ?/sec    1.00    453.6±0.63ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.67     70.7±0.32µs        ? ?/sec    1.00     42.4±0.09µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.67     70.7±0.13µs        ? ?/sec    1.00     42.4±0.07µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.66     70.6±0.08µs        ? ?/sec    1.00     42.4±0.08µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.66     70.6±0.07µs        ? ?/sec    1.00     42.5±0.07µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.67     70.7±0.34µs        ? ?/sec    1.00     42.4±0.05µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.67     70.7±0.12µs        ? ?/sec    1.00     42.4±0.04µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.66     70.7±0.10µs        ? ?/sec    1.00     42.5±0.05µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.67     70.7±0.09µs        ? ?/sec    1.00     42.4±0.09µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.67     70.7±0.08µs        ? ?/sec    1.00     42.4±0.06µs        ? ?/sec
filter context i32 (kept 1/2)                                                 3.13     70.8±0.10µs        ? ?/sec    1.00     22.6±0.04µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.00      6.5±0.33µs        ? ?/sec    1.00      6.4±0.42µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.48    369.9±0.35ns        ? ?/sec    1.00    250.1±1.49ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         2.21    145.0±0.17µs        ? ?/sec    1.00     65.7±0.49µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.38     18.4±0.86µs        ? ?/sec    1.00     13.3±0.40µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.47    662.7±1.02ns        ? ?/sec    1.00    449.5±1.11ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   3.57    299.8±4.47µs        ? ?/sec    1.00     84.0±2.28µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            6.02    347.7±3.65µs        ? ?/sec    1.00     57.8±0.35µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                58.28    37.1±0.94µs        ? ?/sec    1.00    636.7±1.52ns        ? ?/sec
filter context short string view (kept 1/2)                                   2.37    202.6±6.85µs        ? ?/sec    1.00     85.4±6.08µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            2.48    144.3±2.58µs        ? ?/sec    1.00     58.2±1.10µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                73.63    34.2±0.36µs        ? ?/sec    1.00    464.1±0.57ns        ? ?/sec
filter context string (kept 1/2)                                              1.09   591.7±13.39µs        ? ?/sec    1.00   541.6±11.06µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   3.06     71.9±0.27µs        ? ?/sec    1.00     23.5±0.04µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.03      7.6±0.36µs        ? ?/sec    1.00      7.3±0.32µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.17    955.6±4.56ns        ? ?/sec    1.00    813.9±2.12ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           2.20    146.0±0.42µs        ? ?/sec    1.00     66.3±0.13µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.34     19.1±0.47µs        ? ?/sec    1.00     14.2±0.46µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.22   1262.2±6.05ns        ? ?/sec    1.00   1036.4±6.22ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.05    659.1±6.55µs        ? ?/sec    1.00   629.7±15.86µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.11   1163.7±6.77ns        ? ?/sec    1.00   1046.3±2.05ns        ? ?/sec
filter context u8 (kept 1/2)                                                  3.63     68.4±0.11µs        ? ?/sec    1.00     18.9±0.03µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.08  1982.0±12.73ns        ? ?/sec    1.00  1829.0±10.22ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.54    364.8±0.32ns        ? ?/sec    1.00    237.5±0.35ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          2.32    143.3±0.30µs        ? ?/sec    1.00     61.7±0.09µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.56     13.5±0.02µs        ? ?/sec    1.00      8.6±0.02µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.59    855.3±2.14ns        ? ?/sec    1.00    537.2±3.81ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.10    106.1±0.45µs        ? ?/sec    1.00     96.4±0.44µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.01     53.8±1.74µs        ? ?/sec    1.00     53.4±1.52µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.02      3.1±0.01µs        ? ?/sec    1.00      3.0±0.01µs        ? ?/sec
filter f32 (kept 1/2)                                                         1.00    226.6±0.40µs        ? ?/sec    1.02    232.2±0.78µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.00    134.2±0.51µs        ? ?/sec    1.04    140.0±0.36µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.00     69.6±1.78µs        ? ?/sec    1.01     70.6±1.68µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.01      3.2±0.01µs        ? ?/sec    1.00      3.1±0.01µs        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.00    131.8±0.14µs        ? ?/sec    1.02    135.0±0.43µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.04     11.1±0.54µs        ? ?/sec    1.00     10.7±0.42µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.00      3.1±0.01µs        ? ?/sec    1.01      3.1±0.01µs        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.00    181.3±2.07µs        ? ?/sec    1.04    188.4±6.88µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.12    221.0±8.77µs        ? ?/sec    1.00    198.2±6.30µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.01      3.2±0.01µs        ? ?/sec    1.00      3.2±0.01µs        ? ?/sec
filter i32 (kept 1/2)                                                         1.12    102.8±0.41µs        ? ?/sec    1.00     92.0±0.10µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.00      8.7±0.37µs        ? ?/sec    1.00      8.7±0.46µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.01      3.1±0.01µs        ? ?/sec    1.00      3.1±0.01µs        ? ?/sec
filter optimize (kept 1/2)                                                    1.01     86.6±0.20µs        ? ?/sec    1.00     85.8±0.22µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.05      2.8±0.01µs        ? ?/sec    1.00      2.7±0.01µs        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.00      2.8±0.01µs        ? ?/sec    1.00      2.8±0.01µs        ? ?/sec
filter run array (kept 1/2)                                                   1.08    389.3±1.20µs        ? ?/sec    1.00    359.4±2.47µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.16    360.1±1.95µs        ? ?/sec    1.00    310.7±1.57µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.00    246.6±0.86µs        ? ?/sec    1.00    247.0±0.86µs        ? ?/sec
filter single record batch                                                    1.28    118.1±0.47µs        ? ?/sec    1.00     92.6±0.08µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.14    104.4±1.84µs        ? ?/sec    1.00     91.9±0.08µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.08      4.0±0.02µs        ? ?/sec    1.00      3.8±0.01µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.03      3.1±0.01µs        ? ?/sec    1.00      3.0±0.03µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented May 28, 2025

🤖: Benchmark completed

Looks like the filter code need some optimization. I will rerun to see if I can repeat it

@alamb
Copy link
Contributor Author

alamb commented May 28, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/cache_filter_result (f2b2c1b) to 0a4ffa5 diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_cache_filter_result
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented May 28, 2025

🤖: Benchmark completed

Details

group                                                                         alamb_cache_filter_result              main
-----                                                                         -------------------------              ----
filter context decimal128 (kept 1/2)                                          1.78     86.0±1.28µs        ? ?/sec    1.00     48.4±9.28µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.04     50.8±1.24µs        ? ?/sec    1.00     49.1±1.02µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.56    401.4±0.64ns        ? ?/sec    1.00    257.0±0.33ns        ? ?/sec
filter context f32 (kept 1/2)                                                 2.18    151.3±0.24µs        ? ?/sec    1.00     69.5±0.16µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.38     18.6±0.41µs        ? ?/sec    1.00     13.5±0.53µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.61    901.7±8.87ns        ? ?/sec    1.00    558.8±1.31ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.67     70.7±0.43µs        ? ?/sec    1.00     42.4±0.03µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.67     70.7±0.09µs        ? ?/sec    1.00     42.4±0.06µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.66     70.6±0.11µs        ? ?/sec    1.00     42.4±0.19µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.67     70.6±0.08µs        ? ?/sec    1.00     42.4±0.07µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.67     70.7±0.12µs        ? ?/sec    1.00     42.4±0.08µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.67     70.7±0.09µs        ? ?/sec    1.00     42.4±0.11µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.66     70.7±0.10µs        ? ?/sec    1.00     42.5±0.04µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.67     70.6±0.08µs        ? ?/sec    1.00     42.4±0.06µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.66     70.6±0.09µs        ? ?/sec    1.00     42.4±0.11µs        ? ?/sec
filter context i32 (kept 1/2)                                                 3.43     77.6±0.24µs        ? ?/sec    1.00     22.6±0.12µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.04      6.6±0.34µs        ? ?/sec    1.00      6.4±0.41µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.59    399.9±0.57ns        ? ?/sec    1.00    251.3±0.37ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         2.31    151.4±0.23µs        ? ?/sec    1.00     65.6±0.13µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.36     18.5±0.51µs        ? ?/sec    1.00     13.6±0.49µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.57    691.1±0.96ns        ? ?/sec    1.00    440.3±0.73ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   3.43    299.4±3.49µs        ? ?/sec    1.00     87.2±3.85µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            6.08    352.2±5.58µs        ? ?/sec    1.00     57.9±1.43µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                55.30    35.7±1.34µs        ? ?/sec    1.00    646.2±0.92ns        ? ?/sec
filter context short string view (kept 1/2)                                   2.48    200.0±5.99µs        ? ?/sec    1.00     80.7±0.56µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            2.58    149.3±8.41µs        ? ?/sec    1.00     57.9±2.37µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                73.82    34.4±0.99µs        ? ?/sec    1.00    466.7±0.68ns        ? ?/sec
filter context string (kept 1/2)                                              1.09   586.0±13.67µs        ? ?/sec    1.00   537.4±11.90µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   3.09     72.1±0.25µs        ? ?/sec    1.00     23.4±0.06µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.01      7.6±0.57µs        ? ?/sec    1.00      7.6±0.40µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.19    976.8±7.43ns        ? ?/sec    1.00    819.2±1.14ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           2.20    145.8±0.28µs        ? ?/sec    1.00     66.2±0.11µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.34     19.2±0.47µs        ? ?/sec    1.00     14.3±0.29µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.25   1300.6±8.98ns        ? ?/sec    1.00   1044.0±4.94ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.00   648.8±10.65µs        ? ?/sec    1.04   672.0±17.07µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.11  1090.9±10.47ns        ? ?/sec    1.00    982.7±1.69ns        ? ?/sec
filter context u8 (kept 1/2)                                                  4.01     75.7±0.12µs        ? ?/sec    1.00     18.9±0.03µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.11      2.0±0.01µs        ? ?/sec    1.00  1837.7±12.42ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.61    390.2±0.45ns        ? ?/sec    1.00    241.6±0.55ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          2.41    148.5±0.24µs        ? ?/sec    1.00     61.7±0.13µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.53     13.6±0.09µs        ? ?/sec    1.00      8.8±0.03µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.84    977.7±1.50ns        ? ?/sec    1.00    532.0±2.82ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.08    104.2±0.31µs        ? ?/sec    1.00     96.3±0.20µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.03     54.4±1.14µs        ? ?/sec    1.00     52.8±1.93µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.02      3.1±0.00µs        ? ?/sec    1.00      3.0±0.01µs        ? ?/sec
filter f32 (kept 1/2)                                                         1.00    226.4±0.34µs        ? ?/sec    1.02    232.1±0.46µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.00    134.7±0.44µs        ? ?/sec    1.04    140.3±0.42µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.01     72.8±2.00µs        ? ?/sec    1.00     71.8±2.01µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.00      3.2±0.01µs        ? ?/sec    1.00      3.2±0.01µs        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.00    131.6±0.22µs        ? ?/sec    1.03    135.2±0.18µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.00     10.8±0.55µs        ? ?/sec    1.04     11.2±0.46µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.00      3.1±0.01µs        ? ?/sec    1.00      3.1±0.01µs        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.01    183.5±6.15µs        ? ?/sec    1.00    182.2±7.87µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.00    205.8±6.49µs        ? ?/sec    1.10    225.9±5.31µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.03      3.2±0.01µs        ? ?/sec    1.00      3.1±0.02µs        ? ?/sec
filter i32 (kept 1/2)                                                         1.12    102.7±0.19µs        ? ?/sec    1.00     92.0±0.14µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.07      9.1±0.35µs        ? ?/sec    1.00      8.6±0.47µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.00      3.1±0.01µs        ? ?/sec    1.00      3.1±0.01µs        ? ?/sec
filter optimize (kept 1/2)                                                    1.01     86.7±0.23µs        ? ?/sec    1.00     85.7±0.12µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.04      2.8±0.01µs        ? ?/sec    1.00      2.7±0.01µs        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.00      2.8±0.00µs        ? ?/sec    1.00      2.8±0.01µs        ? ?/sec
filter run array (kept 1/2)                                                   1.09    389.6±0.82µs        ? ?/sec    1.00    358.0±0.78µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.16    359.0±1.13µs        ? ?/sec    1.00    310.2±1.14µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.00    247.6±7.21µs        ? ?/sec    1.00    246.5±0.78µs        ? ?/sec
filter single record batch                                                    1.27    117.9±0.20µs        ? ?/sec    1.00     92.7±0.28µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.15    105.4±1.01µs        ? ?/sec    1.00     91.9±0.16µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.09      4.1±0.02µs        ? ?/sec    1.00      3.7±0.02µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.04      3.1±0.01µs        ? ?/sec    1.00      3.0±0.00µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented May 28, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/cache_filter_result (3374c03) to 0a4ffa5 diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_cache_filter_result
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented May 28, 2025

🤖: Benchmark completed

Details

group                                                                         alamb_cache_filter_result              main
-----                                                                         -------------------------              ----
filter context decimal128 (kept 1/2)                                          1.71     72.0±2.25µs        ? ?/sec    1.00     42.1±3.49µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.00     51.0±1.64µs        ? ?/sec    1.00     50.9±1.11µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.33    352.4±0.28ns        ? ?/sec    1.00    265.9±0.27ns        ? ?/sec
filter context f32 (kept 1/2)                                                 2.07    144.5±0.19µs        ? ?/sec    1.00     69.8±0.19µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.29     17.3±0.46µs        ? ?/sec    1.00     13.4±0.38µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.39    772.2±1.18ns        ? ?/sec    1.00    556.1±0.59ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.67     70.6±0.08µs        ? ?/sec    1.00     42.4±0.12µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.67     70.6±0.09µs        ? ?/sec    1.00     42.4±0.06µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.66     70.7±0.14µs        ? ?/sec    1.00     42.4±0.07µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.66     70.7±0.07µs        ? ?/sec    1.00     42.5±0.05µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.67     70.7±0.11µs        ? ?/sec    1.00     42.4±0.07µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.67     70.7±0.12µs        ? ?/sec    1.00     42.4±0.08µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.66     70.7±0.08µs        ? ?/sec    1.00     42.5±0.07µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.67     70.6±0.11µs        ? ?/sec    1.00     42.4±0.06µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.67     70.7±0.14µs        ? ?/sec    1.00     42.4±0.04µs        ? ?/sec
filter context i32 (kept 1/2)                                                 3.12     71.8±0.35µs        ? ?/sec    1.00     23.0±0.53µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.06      6.7±0.47µs        ? ?/sec    1.00      6.3±0.27µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.36    352.3±0.43ns        ? ?/sec    1.00    258.5±0.72ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         2.21    145.0±0.42µs        ? ?/sec    1.00     65.5±0.10µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.26     17.4±0.43µs        ? ?/sec    1.00     13.8±0.57µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.50    673.4±0.55ns        ? ?/sec    1.00    449.1±0.51ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   7.35    679.9±3.50µs        ? ?/sec    1.00     92.5±7.66µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            5.62    334.1±4.83µs        ? ?/sec    1.00     59.4±2.48µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                57.87    37.5±1.24µs        ? ?/sec    1.00    647.8±0.66ns        ? ?/sec
filter context short string view (kept 1/2)                                   6.10    588.8±6.15µs        ? ?/sec    1.00     96.5±4.12µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            2.65    151.7±6.64µs        ? ?/sec    1.00     57.2±1.83µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                78.77    36.8±1.51µs        ? ?/sec    1.00    466.8±0.38ns        ? ?/sec
filter context string (kept 1/2)                                              1.04   574.1±10.97µs        ? ?/sec    1.00   550.3±14.13µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   3.36     78.5±0.26µs        ? ?/sec    1.00     23.4±0.06µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.00      7.3±0.38µs        ? ?/sec    1.01      7.4±0.31µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.15    947.6±1.33ns        ? ?/sec    1.00    821.9±1.22ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           2.29    152.0±0.42µs        ? ?/sec    1.00     66.4±0.10µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.25     18.0±0.44µs        ? ?/sec    1.00     14.4±0.46µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.24   1275.0±2.93ns        ? ?/sec    1.00   1024.8±3.51ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.05   647.8±13.77µs        ? ?/sec    1.00   619.0±11.27µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.00    926.4±7.03ns        ? ?/sec    1.26   1163.1±5.12ns        ? ?/sec
filter context u8 (kept 1/2)                                                  3.72     70.3±0.25µs        ? ?/sec    1.00     18.9±0.03µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.07  1977.6±14.43ns        ? ?/sec    1.00  1849.5±10.23ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.41    346.6±0.51ns        ? ?/sec    1.00    245.6±0.45ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          2.32    143.3±0.31µs        ? ?/sec    1.00     61.7±0.10µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.41     12.4±0.03µs        ? ?/sec    1.00      8.8±0.02µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.64    896.2±1.60ns        ? ?/sec    1.00    546.0±7.43ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.06    102.7±0.36µs        ? ?/sec    1.00     97.0±0.85µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.00     52.5±1.59µs        ? ?/sec    1.03     53.8±1.71µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.00      2.4±0.00µs        ? ?/sec    1.26      3.0±0.02µs        ? ?/sec
filter f32 (kept 1/2)                                                         1.00    222.8±0.47µs        ? ?/sec    1.04    232.1±0.52µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.00    133.4±0.44µs        ? ?/sec    1.05    139.9±0.37µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.03     71.5±3.47µs        ? ?/sec    1.00     69.5±1.98µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.00      2.5±0.01µs        ? ?/sec    1.25      3.2±0.00µs        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.00    129.7±0.19µs        ? ?/sec    1.04    134.8±0.17µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.02     11.5±0.51µs        ? ?/sec    1.00     11.3±0.71µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.00      2.4±0.00µs        ? ?/sec    1.26      3.1±0.01µs        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.03    193.8±8.66µs        ? ?/sec    1.00    187.3±9.88µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.00    205.7±6.17µs        ? ?/sec    1.02    209.1±5.13µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.00      2.5±0.01µs        ? ?/sec    1.26      3.2±0.01µs        ? ?/sec
filter i32 (kept 1/2)                                                         1.10    101.4±0.33µs        ? ?/sec    1.00     92.1±0.16µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.08      9.0±0.43µs        ? ?/sec    1.00      8.3±0.30µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.00      2.4±0.01µs        ? ?/sec    1.27      3.1±0.01µs        ? ?/sec
filter optimize (kept 1/2)                                                    1.00     81.7±0.18µs        ? ?/sec    1.05     85.7±0.19µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.17      3.1±0.01µs        ? ?/sec    1.00      2.7±0.01µs        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.00      2.1±0.01µs        ? ?/sec    1.31      2.8±0.00µs        ? ?/sec
filter run array (kept 1/2)                                                   1.25    449.6±2.05µs        ? ?/sec    1.00    358.6±0.75µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.32    410.5±1.29µs        ? ?/sec    1.00    310.8±1.29µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.27    314.5±0.77µs        ? ?/sec    1.00    247.7±0.80µs        ? ?/sec
filter single record batch                                                    1.26    116.3±0.16µs        ? ?/sec    1.00     92.6±0.08µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.10    100.9±0.25µs        ? ?/sec    1.00     92.0±0.13µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.11      4.1±0.01µs        ? ?/sec    1.00      3.7±0.02µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.00      2.5±0.01µs        ? ?/sec    1.21      3.0±0.01µs        ? ?/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants