Can we improve the linear scan part of skipping to possibly compile to CMOVcc? #12476

mikemccand · 2023-07-31T11:21:23Z

Description

@fulmicoton (Tantivy creator) reached out to me after our fun discussion about how to tap into branchless CPU instructions (CMOVcc on x86-64) from way up a in javaland far above the bare metal.

Because Lucene (and Tantivy) encode postings in blocks of 128 docids (+freqs) at once, when skipping, after using the skiplist to find the block that may or may not contain the target doc, there is inevitably a "within block" scan (of up to 128 docs) that is needed to find it.

@fulmicoton pointed out that the linear scan phase of Lucene's skipping could maybe be rewritten "just so" in a way that Hotspot would recognize it and would compile to CMOVcc. We could turn on "print assembly" from Hotspot to iterate until it does or does not produce CMOVcc and then measure which way is "typically" more performant.

mikemccand · 2023-07-31T11:21:42Z

Thank you for the pointer @fulmicoton!

mikemccand added the type:enhancement label Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we improve the linear scan part of skipping to possibly compile to CMOVcc? #12476

Can we improve the linear scan part of skipping to possibly compile to CMOVcc? #12476

mikemccand commented Jul 31, 2023

mikemccand commented Jul 31, 2023

Can we improve the linear scan part of skipping to possibly compile to CMOVcc? #12476

Can we improve the linear scan part of skipping to possibly compile to CMOVcc? #12476

Comments

mikemccand commented Jul 31, 2023

Description

mikemccand commented Jul 31, 2023