You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At read-time the FST apis must read bytes in reverse, which is perverse and unnatural for all stacks in modern CPUs / IO devices that do read-ahead optimizations for forward reading.
It's quite complex to change the underlying FST format to become fundamentally forward only. It'd require rewriting node addresses, which may then take different numbers of vInt bytes, causing more renumbering, etc.
A simpler first step might be, at FST freeze() time (when the FST is done being compiled), reverse all bytes in the underlying storage (on disk or on heap), and at read time, pretend to the caller that they are still reading backwards, yet actually read forwards. We could even do this separately for each store, e.g. start by testing on-heap read-time double reversal, or maybe start with on-disk where the OS's readahead optimizations may matter more.
It should be relatively trivial to implement yet hard to think about, and we could see if it helps performance.
The text was updated successfully, but these errors were encountered:
Description
At read-time the FST apis must read bytes in reverse, which is perverse and unnatural for all stacks in modern CPUs / IO devices that do read-ahead optimizations for forward reading.
It's quite complex to change the underlying FST format to become fundamentally forward only. It'd require rewriting node addresses, which may then take different numbers of
vInt
bytes, causing more renumbering, etc.A simpler first step might be, at FST
freeze()
time (when the FST is done being compiled), reverse all bytes in the underlying storage (on disk or on heap), and at read time, pretend to the caller that they are still reading backwards, yet actually read forwards. We could even do this separately for each store, e.g. start by testing on-heap read-time double reversal, or maybe start with on-disk where the OS's readahead optimizations may matter more.It should be relatively trivial to implement yet hard to think about, and we could see if it helps performance.
The text was updated successfully, but these errors were encountered: