Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRUCache.__copy__ is unsound #3852

Open
asottile-sentry opened this issue Dec 4, 2024 · 8 comments
Open

LRUCache.__copy__ is unsound #3852

asottile-sentry opened this issue Dec 4, 2024 · 8 comments
Labels
Component: SDK Core Dealing with the core of the SDK Type: Bug Something isn't working

Comments

@asottile-sentry
Copy link
Member

How do you use Sentry?

Sentry Saas (sentry.io)

Version

latest git revision

Steps to Reproduce

unfortunately I haven't completely narrowed down a "minimal" reproduction but I originally found this from reading the implementation and tracing the output -- attached is a script and data which produces a crash due to a particular ordering of copys and sets.

the root cause of the problem is that __copy__ is incorrect as written -- it shallow copies the internal datastructures which incorrectly pollutes newly created caches

in the ~best case this produces incorrect values poisoning the parent copy:

>>> cache1 = LRUCache(max_size=2)
>>> cache1.set(1, True)
>>> cache2 = cache1.__copy__()
>>> cache2.set(1, False)
>>> cache1.get(1)
False

at worst case it causes crashes (see attached data)

from sentry_sdk._lru_cache import LRUCache


def main() -> int:
    caches = {}

    with open('lru_cache.log') as f:
        for line in f:
            line = line.replace(': ', ' ')

            match line.split():
                case [cid, '__init__']:
                    caches[cid] = LRUCache(max_size=100)
                case [cid, 'set', k, *v]:
                    caches[cid].set(k, ' '.join(v))
                case [cid, '__copy__', '->', cid2]:
                    caches[cid2] = caches[cid].__copy__()
                case unreachable:
                    raise AssertionError(unreachable)


if __name__ == '__main__':
    raise SystemExit(main())

lru_cache.log

cc @cmanallen -- introduced in dd1117d

Expected Result

should not crash, should not have internal data integrity problems

Actual Result

the script eventually crashes with:

$ ./venvnew/bin/python3 t3.py 
Traceback (most recent call last):
  File "/Users/asottile/workspace/sentry/t3.py", line 23, in <module>
    raise SystemExit(main())
                     ^^^^^^
  File "/Users/asottile/workspace/sentry/t3.py", line 15, in main
    caches[cid].set(k, ' '.join(v))
  File "/Users/asottile/workspace/sentry/venvnew/lib/python3.12/site-packages/sentry_sdk/_lru_cache.py", line 132, in set
    del self.cache[old_key]
        ~~~~~~~~~~^^^^^^^^^
KeyError: 'organizations:continuous-profiling-beta'
@asottile-sentry
Copy link
Member Author

fwiw the original trace was generated via this diff:

$ diff -u {venvnew,.venv}/lib/python3.12/site-packages/sentry_sdk/_lru_cache.py 
--- venvnew/lib/python3.12/site-packages/sentry_sdk/_lru_cache.py	2024-12-04 17:17:44
+++ .venv/lib/python3.12/site-packages/sentry_sdk/_lru_cache.py	2024-12-04 17:16:18
@@ -62,6 +62,7 @@
 
 """
 
+from uuid import uuid4
 from copy import copy
 
 SENTINEL = object()
@@ -74,8 +75,16 @@
 VALUE = 3
 
 
+
 class LRUCache:
-    def __init__(self, max_size):
+    def _log(self, m):
+        with open('lru_cache.log', 'a+') as f:
+            f.write(f'{self._u}: {m}\n')
+
+    def __init__(self, max_size, _log=True):
+        self._u = uuid4()
+        if _log:
+            self._log('__init__')
         assert max_size > 0
 
         self.max_size = max_size
@@ -92,13 +101,15 @@
         self.hits = self.misses = 0
 
     def __copy__(self):
-        cache = LRUCache(self.max_size)
+        cache = LRUCache(self.max_size, _log=False)
+        self._log(f'__copy__ -> {cache._u}')
         cache.full = self.full
         cache.cache = copy(self.cache)
         cache.root = copy(self.root)
         return cache
 
     def set(self, key, value):
+        self._log(f'set {key} {value}')
         link = self.cache.get(key, SENTINEL)
 
         if link is not SENTINEL:
@@ -129,7 +140,11 @@
 
             self.root[KEY] = self.root[VALUE] = None
 
-            del self.cache[old_key]
+            try:
+                del self.cache[old_key]
+            except KeyError:
+                self._log('crash!!!')
+                raise
 
             self.cache[key] = old_root
 
@@ -141,6 +156,7 @@
             self.full = len(self.cache) >= self.max_size
 
     def get(self, key, default=None):
+        self._log(f'get {key}')
         link = self.cache.get(key, SENTINEL)
 
         if link is SENTINEL:

and then running the first 504 tests of shard 7 in the getsentry/sentry testsuite

@asottile-sentry
Copy link
Member Author

asottile-sentry commented Dec 4, 2024

assuming python3.6+ only you may be able to utilize builtin dict as the lru type:

_SENTINEL = object()

class LRUCache:
    def __init__(self, max_size: int):
        if max_size <= 0:
            raise AssertionError(f'invalid max_size: {max_size}')
        self.max_size = max_size
        self._data = {}
        self.hits = self.misses = 0
        self.full = False

    def __copy__(self):
        new = LRUCache(max_size=self.max_size)
        new.hits = self.hits
        new.misses = self.misses
        new.full = self.full
        new._data = self._data.copy()
        return new

    def set(self, key, value):
        current = self._data.pop(key, _SENTINEL)
        if current is not _SENTINEL:
            self._data[key] = value
        elif self.full:
            self._data.pop(next(iter(self._data)))
            self._data[key] = value
        else:
            self._data[key] = value
        self.full = len(self._data) >= self.max_size

    def get(self, key, default=None):
        try:
            ret = self._data.pop(key)
        except KeyError:
            self.misses += 1
            ret = default
        else:
            self.hits += 1
            self._data[key] = ret

        return ret

    def get_all(self):
        return list(self._data.items())

(this passes the current set of tests -- though I'd argue that the AssertionError should actually be ValueError)

@antonpirker antonpirker added Type: Bug Something isn't working Component: SDK Core Dealing with the core of the SDK labels Dec 5, 2024
@sl0thentr0py
Copy link
Member

cc @Zylphrex @cmanallen

@antonpirker
Copy link
Member

This is related to this: #3862

I will make a patch release with the other fix because this feature is going open beta today.

Then we can see the suggested solution above to maybe implement it as the long term fix of lru cache.

@cmanallen
Copy link
Member

Thanks for finding this @asottile-sentry. Sorry I didn't see it sooner. It was lost in my GitHub notifications.

@cmanallen
Copy link
Member

Fixed by: #3861

@asottile-sentry
Copy link
Member Author

I think there's probably still another bug of a similar flavor since self.cache is shallow copied but the backing data (self.root) is deep copied (meaning the cache has references to the original object's data)

@asottile-sentry
Copy link
Member Author

yeah in fact the reproduction in this issue still happens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: SDK Core Dealing with the core of the SDK Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants