-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in object_by_id_cache #20450
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
Suppose that a reader thread is trying to cache an object that it just read, while a writer thread is trying to cache an object that it just wrote. The writer thread definitionally has the latest version. The reader thread may be out of date. While we previously took some care to not replace a new version with an old version, this did not take into account evictions, and so the following bug was possible: READER WRITER read object_by_id_cache (miss) read dirty set (miss) write to dirty read db (old version) write to cache (while holding dirty lock) cache entry is evicted write to cache There is no way for the reader to tell that the value is is caching is out of date, because the update to date entry is already gone from the cache. We fix this by requiring reader threads to obtain a ticket before they read from the dirty set and/or db. Tickets are expired by writers. Then, the above case looks like this: READER WRITER get ticket read cache (miss) read dirty (miss) write dirty read db (old version) expire ticket write cache (while holding dirty lock) cache eviction no write to cache (ticket expired) Any interleaving of the above either results in the reader seeing a recent version, or else having an expired ticket.
e8d07f3
to
98b7edd
Compare
// new ticket after this point, then it will read the new value from | ||
// the dirty set (or db). | ||
if matches!(ticket, Ticket::Write) { | ||
gen.fetch_add(1, std::sync::atomic::Ordering::Release); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this scenario possible?
Key 1 Gen 0 is not in cache
Writer calls insert Key 1 and fetch_add Gen 1
Reader obtains ticket Gen 1 for Key 1
Reader misses cache read and reads old value for Key 1 from DB
Writer writes new value for Key 1 to dirty cache
Cache value for Key 1 gets evicted
Reader caches old value in cache with valid Gen 1 ticket
If so, would swapping the generation fetch_add to after the cache update fix it? Or maybe cache eviction also updates generation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not possible because the writer holds the lock on the dirty cache while it calls fetch_add. reader waits for that lock before checking in the dirty cache. So reader will either complete its read before the fetch_add (in which case it will get an old ticket), or it will wait until after the fetch_add, in which case it is guaranteed to read the newest value from the dirty set (or possibly the db if the reader waits so long that the value has been flushed to the db).
Suppose that a reader thread is trying to cache an object that it just read, while a writer thread is trying to cache an object that it just wrote. The writer thread definitionally has the latest version. The reader thread may be out of date. While we previously took some care to not replace a new version with an old version, this did not take into account evictions, and so the following bug was possible:
There is no way for the reader to tell that the value it is caching is out of date, because the up to date entry is already gone from the cache.
We fix this by requiring reader threads to obtain a ticket before they read from the dirty set and/or db. Tickets are expired by writers. Then, the above case looks like this:
Any interleaving of the above either results in the reader seeing a recent version, or else having an expired ticket.