Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve hash mixing in FST's double-barrel LRU hash #12704

Open
mikemccand opened this issue Oct 20, 2023 · 4 comments
Open

Improve hash mixing in FST's double-barrel LRU hash #12704

mikemccand opened this issue Oct 20, 2023 · 4 comments

Comments

@mikemccand
Copy link
Member

Description

Spinoff from this cool comment, thanks to hashing guru @bruno-roustant:

Instead, we should multiply with the gold constant BitMixer#PHI_C64 (make it public).
This really makes a difference in the evenness of the value distribution. This is one of the secrets of the HPPC hashing. By applying this, we get multiple advantages:

  * lookup should be improved (less hash collision)
  * we can try to rehash at 3/4 occupancy because the performance should not be impacted until this point.
  * in case of hash collision, we can lookup linearly with a pos = pos + 1 instead of quadratic probe (lines 95 and 327); this may avoid some mem cache miss.
  * 
(same for the other hash method)

This is a simple change, we just need to test on some real FST building cases to confirm good mixing "in practice". The new IndexToFST tool in luceneutil is helpful for this.

@bruno-roustant
Copy link
Contributor

@dweiss will probably say more than me about the awesome BitMixer#PHI_C64 constant!

@dweiss
Copy link
Contributor

dweiss commented Oct 22, 2023

I borrowed that constant in BitMixer from Sebastiano Vigna, I believe. Here is a nice overview of its origin/ rationale:

https://softwareengineering.stackexchange.com/questions/402542/where-do-magic-hashing-constants-like-0x9e3779b9-and-0x9e3779b1-come-from

I can only confirm that a good hash redistribution function, along with linear probing, give very good results in most hash/index redistribution problems I've seen.

@shubhamvishu
Copy link
Contributor

I opened a PR to make use of this constant : #12716

Also, I was thinking if this constant could be utilised in other hash function implementations as well in the codebase?

shubhamvishu added a commit to shubhamvishu/lucene that referenced this issue Oct 24, 2023
@dweiss
Copy link
Contributor

dweiss commented Oct 25, 2023

If you'd like to do so, I'd suggest moving such a "scattering remix" utility to a separate class and reusing it elsewhere, much like here:
https://github.com/carrotsearch/hppc/blob/master/hppc/src/main/java/com/carrotsearch/hppc/BitMixer.java#L96-L99

Makes it easier to change the remixing strategy everywhere at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants