Create a simple JMH benchmark to measure FST compilation / traversal times #12884

mikemccand · 2023-12-07T10:40:38Z

Description

Over in #12543 we are struggling to measure the performance cost of different ways of creating an on-heap reader/writer. We have been using the "rough" numbers coming out of Test2BFST runs but this is non-ideal -- it is test code, running with assertions, perhaps doing sub-optimal FST usage (not matching what, say, block tree would do to the terms index).

Let's create a simple micro-benchmark to more readily benchmark FST changes?

The text was updated successfully, but these errors were encountered:

dungba88 · 2023-12-10T14:59:06Z

We can use #12879 as a benchmark candidate to compare against the current baseline.

dungba88 · 2023-12-13T11:43:21Z

I can look into this. Is this place https://github.com/apache/lucene/tree/main/lucene/benchmark/src/java/org/apache/lucene/benchmark the correct path to add the benchmark, or is https://github.com/mikemccand/luceneutil a better place?

I think we can try to create a FST with term from some corpus, like wikipedia, then repeatedly call Util.get(), and measure the op/sec, etc (JMH should give us that). How does that sound?

mikemccand · 2023-12-14T11:24:26Z

I can look into this. Is this place https://github.com/apache/lucene/tree/main/lucene/benchmark/src/java/org/apache/lucene/benchmark the correct path to add the benchmark, or is https://github.com/mikemccand/luceneutil a better place?

I think maybe luceneutil? We would want to feed it a fairly large set of terms, ideally just extracted from a Lucene index using something like what IndexToFST (in luceneutil) does?

I think we can try to create a FST with term from some corpus, like wikipedia, then repeatedly call Util.get(), and measure the op/sec, etc (JMH should give us that). How does that sound?

+1, sounds great! I can help fold this into the nightly charts...

mikemccand · 2023-12-14T11:24:41Z

Thanks @dungba88.

mikemccand added the type:enhancement label Dec 7, 2023

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Dec 7, 2023

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Dec 7, 2023

dungba88 mentioned this issue Dec 10, 2023

Optimize FST on-heap BytesReader #12879

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a simple JMH benchmark to measure FST compilation / traversal times #12884

Create a simple JMH benchmark to measure FST compilation / traversal times #12884

mikemccand commented Dec 7, 2023

dungba88 commented Dec 10, 2023

dungba88 commented Dec 13, 2023 •

edited

Loading

mikemccand commented Dec 14, 2023

mikemccand commented Dec 14, 2023

Create a simple JMH benchmark to measure FST compilation / traversal times #12884

Create a simple JMH benchmark to measure FST compilation / traversal times #12884

Comments

mikemccand commented Dec 7, 2023

Description

dungba88 commented Dec 10, 2023

dungba88 commented Dec 13, 2023 • edited Loading

mikemccand commented Dec 14, 2023

mikemccand commented Dec 14, 2023

dungba88 commented Dec 13, 2023 •

edited

Loading