-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better LRU and Fix for Deadlock #120
base: main
Are you sure you want to change the base?
Conversation
let mut block_hash_to_payload_ids = self.block_hash_to_payload_ids.lock().await; | ||
let mut payload_id_to_span = self.payload_id_to_span.lock().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deadlock can occur here, as locks are acquired in opposite order as in fn store
above.
Thanks for the call out and contribution! Would you be willing to open up an issue for any other lock behavior usage you see as problematic? An audit locks issue would also work |
@dmarzzz Yeah no problem! I'll probably just push another commit to this branch with a general lock cleanup. Already fixed a few so may as well keep it going :) |
Okay so I think I've cleaned up all the locks excluding those used in tests. Those should probably be cleaned up as some point as well, but let's leave that for another PR. @dmarzzz lemme know if this looks good to you :) |
payload_id_to_span: Arc<Mutex<LruCache<PayloadId, Arc<BoxedSpan>>>>, | ||
local_to_external_payload_ids: Arc<Mutex<LruCache<PayloadId, PayloadId>>>, | ||
tracer: BoxedTracer, | ||
block_hash_to_payload_ids: Cache<B256, Vec<PayloadId>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious how this Cache differs from LruCache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just lock free, which means we don't need to worry about locking ourselves. Helps avoid errors! Should be significantly faster as well, not that it really matters in this case haha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for the PR, makes the trace logic much cleaner
fc13a4d
to
4e1b965
Compare
4e1b965
to
716ff7f
Compare
@avalonche feel free to merge this in when you're ready! |
I discovered a deadlock in the rollup boost server. I've posted a fix here using a concurrent, lock free LRU cache. In general this should make the code here more readable and less prone to errors.
As a general note, I've noticed that the way mutex locks are being handled in this code base seems to be pretty fast and loose. There were several cases where locks were being held far longer than they need to be.