-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple trie updates from write transactions #1934
Comments
FYI: SQLite insert performance is significantly worse when using an externally generated integer primary key. The reason for this is that SQLite needs to check the uniqueness of the data we're passing in -- it can skip this check when generating a unique id on its own. |
Hmm. That's a bummer but must surely be solvable somehow.. pragma_ignore_check_constraints looks like the thing to use? |
That pragma seems to affect only
|
Our trie update logic is currently tied directly to a database write transaction. This means that updating the state trie for a given block can only occur in the same transaction that is inserting the block into storage. This also means that the next block's trie update can only begin once the current block has been inserted into storage. This prevents pipelining of block processing because each block is waiting for the state trie update of its parent to be inserted into storage. Currently this is achieved by simply performing the trie update in the same process as inserting the rest of the block data - which kind of sucks because it is also quite expensive.
Basically, for every other bit of data we can separate the processing from the writing to storage. Except for state trie updates. Your job is to fix this :)
Why is it like this
When a trie is updated, it has a set of new nodes that must be inserted into storage. Each requires a unique node ID from storage for the trie graph to use to reference that node. The IDs are generated when the nodes are inserted into storage.
This means the next update must first wait for those IDs to be generated, and for those nodes to be stored - otherwise the next update might attempt to read one of these new nodes which would fail.
Proposed solution
Add a new method to storage which returns a set of new node indices for the trie update to use. Importantly, this would not rely on a db transaction, but rather something like a
Arc<Mutex<u64>>
(for each trie table).SELECT MAX(index) from xxx_trie
fn get_indices(N: usize) -> Vec<u64>
(or a range or whatever), returns the nextN
indicesThis relies on indices only ever increasing i.e. if things get deleted, no using of the fragmented holes etc, that would require db transactions which is no bueno.
This will enable the following pattern:
N
new nodesN
indicesCrucially, steps 1-3 do not have to be part of the writer process.
Additional considerations
Because the updater is now decoupled from the writer task, this means the updater no longer knows when the new nodes will enter storage. This means he has to keep a local cache of the nodes he has emitted for storage, until he is certain they are actually stored. One way to do this is to check the database at the start of each update, to check if the cached nodes are in storage yet - if yes, they can be dropped from the cache.
This caching gets a bit trickier if you consider that contract tries can be updated concurrently - so they must share a cache.
... there was another point but it has slipped my mind...
The text was updated successfully, but these errors were encountered: