Speed up appending changes to a batch #6025
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When we finish processing a block, we put all the changes into a
Batch
; that batch in turn is then added to the write queue where it might be combined with already existing batches for earlier blocks. An important aspect of speeding up writes is that block ranges for entities that have changes in several batches in the queue are properly closed before changes are written to the database, making it unnecessary to do that with much slower SQL queries.The way this was done inside the
Batch
was fairly slow: we kept all changes for entities of a given type in a list, and when we needed to add a new change, we would search the entire list to find the last change for the same entity so we could manipulate the block range of that appropriately.With this change, we keep a hash map from entity id to most recent change so that looking up those changes doesn't require traversing a list anymore.
The speedup can be seen with the
append_row
example. For example, which is fairly realistic,reports a time of about 110ms to construct the batch, whereas running the same with
GRAPH_STORE_WRITE_BATCH_MEMOIZE=true
takes only about 1.8ms.