Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close/get race segfault #180

Open
jnwatson opened this issue May 18, 2018 · 4 comments
Open

Close/get race segfault #180

jnwatson opened this issue May 18, 2018 · 4 comments

Comments

@jnwatson
Copy link
Owner

Ubuntu lmdb version 0.94, cpython version.
This code added as a method to CrashTest in tests/crash_test.py crashes 90% of the time:

    def testCloseGetRace(self):
        def reader(self):
            for i in range(1000):
                try:
                    with self.env.begin() as txn:
                        txn.get(i.to_bytes(8, 'big'))
                except lmdb.Error:
                    pass
        ts = [threading.Thread(target=reader, args=(self,)) for _ in range(2)]
        [t.start() for t in ts]
        for i in range(1000):
            self.env.close()
            self.env = lmdb.open(self.path, max_dbs=10)

        [t.join() for t in ts]

At least one crash failure is that a transobject is living well past the lifetime of its envobject. This is caused because, in env_clear when it is calling clear on its children (https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L1116), trans_clear can release the GIL (https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L2941), allowing a new transaction to sneak in as a child that never gets cleared. The debug log confirms this.

Solution: I can remove the GIL release around mdb_txn_abort, or I can mark the env so a transaction creation won't get that far. Ideas?

@dw
Copy link
Collaborator

dw commented May 18, 2018

Yikes! This is a nice one :) I think with MDB_WRITEMAP abort() can be expensive, so holding GIL there might be problematic, but it's still a reasonable solution. How about a 'bool closing' flag in the Environment that causes an exception to be thrown in the Transaction constructor? That would work, if I'm understanding things correctly. Threading is hard!

@dw
Copy link
Collaborator

dw commented May 18, 2018

Re: writemap, for large transactions MDB starts spilling dirty pages to the map before the transaction has completed. I can't remember if it needs to perform any kind of rollback. Come to think of it, I don't think it does. So either of these solutions sound fine really

@jnwatson
Copy link
Owner Author

jnwatson commented Sep 5, 2018

I found a bunch more races, and I have a solution to them all, in cpython. Now to tackle cffi.py...

@jnwatson
Copy link
Owner Author

jnwatson commented Jul 4, 2019

Keeping this open. This is, in general, super problematic to solve because it incurs overhead on 99% of folks that don't do multithreading stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants