You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because Lucene (and Tantivy) encode postings in blocks of 128 docids (+freqs) at once, when skipping, after using the skiplist to find the block that may or may not contain the target doc, there is inevitably a "within block" scan (of up to 128 docs) that is needed to find it.
@fulmicoton pointed out that the linear scan phase of Lucene's skipping could maybe be rewritten "just so" in a way that Hotspot would recognize it and would compile to CMOVcc. We could turn on "print assembly" from Hotspot to iterate until it does or does not produce CMOVcc and then measure which way is "typically" more performant.
The text was updated successfully, but these errors were encountered:
Description
@fulmicoton (Tantivy creator) reached out to me after our fun discussion about how to tap into branchless CPU instructions (CMOVcc on x86-64) from way up a in javaland far above the bare metal.
Because Lucene (and Tantivy) encode postings in blocks of 128 docids (+freqs) at once, when skipping, after using the skiplist to find the block that may or may not contain the target doc, there is inevitably a "within block" scan (of up to 128 docs) that is needed to find it.
@fulmicoton pointed out that the linear scan phase of Lucene's skipping could maybe be rewritten "just so" in a way that Hotspot would recognize it and would compile to CMOVcc. We could turn on "print assembly" from Hotspot to iterate until it does or does not produce CMOVcc and then measure which way is "typically" more performant.
The text was updated successfully, but these errors were encountered: