Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better LRU and Fix for Deadlock #120

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

0xForerunner
Copy link

I discovered a deadlock in the rollup boost server. I've posted a fix here using a concurrent, lock free LRU cache. In general this should make the code here more readable and less prone to errors.

As a general note, I've noticed that the way mutex locks are being handled in this code base seems to be pretty fast and loose. There were several cases where locks were being held far longer than they need to be.

Comment on lines -67 to -68
let mut block_hash_to_payload_ids = self.block_hash_to_payload_ids.lock().await;
let mut payload_id_to_span = self.payload_id_to_span.lock().await;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deadlock can occur here, as locks are acquired in opposite order as in fn store above.

@dmarzzz
Copy link
Member

dmarzzz commented Feb 22, 2025

Thanks for the call out and contribution! Would you be willing to open up an issue for any other lock behavior usage you see as problematic? An audit locks issue would also work

@0xForerunner
Copy link
Author

@dmarzzz Yeah no problem! I'll probably just push another commit to this branch with a general lock cleanup. Already fixed a few so may as well keep it going :)

@0xForerunner
Copy link
Author

Okay so I think I've cleaned up all the locks excluding those used in tests. Those should probably be cleaned up as some point as well, but let's leave that for another PR.

@dmarzzz lemme know if this looks good to you :)

payload_id_to_span: Arc<Mutex<LruCache<PayloadId, Arc<BoxedSpan>>>>,
local_to_external_payload_ids: Arc<Mutex<LruCache<PayloadId, PayloadId>>>,
tracer: BoxedTracer,
block_hash_to_payload_ids: Cache<B256, Vec<PayloadId>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious how this Cache differs from LruCache?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just lock free, which means we don't need to worry about locking ourselves. Helps avoid errors! Should be significantly faster as well, not that it really matters in this case haha.

Copy link
Collaborator

@avalonche avalonche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for the PR, makes the trace logic much cleaner

@0xForerunner 0xForerunner force-pushed the forerunner/deadlock-fix branch from fc13a4d to 4e1b965 Compare March 6, 2025 10:02
@0xForerunner 0xForerunner force-pushed the forerunner/deadlock-fix branch from 4e1b965 to 716ff7f Compare March 6, 2025 10:05
@0xForerunner
Copy link
Author

@avalonche feel free to merge this in when you're ready!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants