-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Make flatVectorsFormat
injectable in Lucene99HnswVectorsFormat
to allow custom format and scorers
#15090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Make flatVectorsFormat
injectable in Lucene99HnswVectorsFormat
to allow custom format and scorers
#15090
Conversation
Signed-off-by: ManasviGoyal <[email protected]>
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
@ChrisHegarty @benwtrent Please have a look. Thanks! |
All lucene formats are named based. Meaning, no arguments can actually be applied to the reader. How does this actually work in practice? I would expect a new format class is required for any different flat format that is used for HNSW because of names format loader |
Yes it is correct that each distinct flat format still needs its own named This PR just lets |
This isn't how it works: you should create an outer format for each variation. We can't support backwards compatibility for custom formats. |
More context: Currently we do have outer HNSW format wrappers for each variation of flat vectors format. But in order to do so we are creating multiple duplicates of Can you please elaborate on your point regarding backward compatibility concerns? |
I understand the desire to get new changes out of the box. However, all formats are named based. If there was a substantial change to the HNSW format that required a new name, you would need a new inherited class that provides a NEW name for your format that utilizes a new flat format. The SPI loader cannot provide any parameters. When constructing the reader, its done with the default ctor (e.g. This API change with how things are now just will not work. Further complexity in the named format loader to handle recursively named things just seems way too complicated to justify the 20-30 lines of code saved. |
I can understand the reasoning behind this request - I encountered a somewhat similar situation in the past and had considered making a similar change, but didn't (for the same reasons as given by @benwtrent and @rmuir). That said, I'm not sure this PR is addressing the core issue. If there's a meaningful, reusable piece here, it might make sense to refactor it out of the format so that custom formats can take advantage of it - but it's unclear to me what that reusable part would be. |
This PR makes
Lucene99HnswVectorsFormat
accept an injectedFlatVectorsFormat
to allow custom flatvector format to be used with the existing codec.For testing and BWC, the original package-private 5-arg constructor is retained. No on-disk format or runtime behavior changes occur unless a custom format/scorer is provided.