Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a simple JMH benchmark to measure FST compilation / traversal times #12884

Open
mikemccand opened this issue Dec 7, 2023 · 4 comments

Comments

@mikemccand
Copy link
Member

Description

Over in #12543 we are struggling to measure the performance cost of different ways of creating an on-heap reader/writer. We have been using the "rough" numbers coming out of Test2BFST runs but this is non-ideal -- it is test code, running with assertions, perhaps doing sub-optimal FST usage (not matching what, say, block tree would do to the terms index).

Let's create a simple micro-benchmark to more readily benchmark FST changes?

@dungba88
Copy link
Contributor

We can use #12879 as a benchmark candidate to compare against the current baseline.

@dungba88
Copy link
Contributor

dungba88 commented Dec 13, 2023

I can look into this. Is this place https://github.com/apache/lucene/tree/main/lucene/benchmark/src/java/org/apache/lucene/benchmark the correct path to add the benchmark, or is https://github.com/mikemccand/luceneutil a better place?

I think we can try to create a FST with term from some corpus, like wikipedia, then repeatedly call Util.get(), and measure the op/sec, etc (JMH should give us that). How does that sound?

@mikemccand
Copy link
Member Author

I can look into this. Is this place https://github.com/apache/lucene/tree/main/lucene/benchmark/src/java/org/apache/lucene/benchmark the correct path to add the benchmark, or is https://github.com/mikemccand/luceneutil a better place?

I think maybe luceneutil? We would want to feed it a fairly large set of terms, ideally just extracted from a Lucene index using something like what IndexToFST (in luceneutil) does?

I think we can try to create a FST with term from some corpus, like wikipedia, then repeatedly call Util.get(), and measure the op/sec, etc (JMH should give us that). How does that sound?

+1, sounds great! I can help fold this into the nightly charts...

@mikemccand
Copy link
Member Author

Thanks @dungba88.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants