Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to build with Python 3.13 due to removal of PyObject_AsReadBuffer #362

Open
AdamWill opened this issue Jun 14, 2024 · 9 comments · May be fixed by #368
Open

Fails to build with Python 3.13 due to removal of PyObject_AsReadBuffer #362

AdamWill opened this issue Jun 14, 2024 · 9 comments · May be fixed by #368

Comments

@AdamWill
Copy link

Affected Operating Systems

All

Affected py-lmdb Version

All

py-lmdb Installation Method

Irrelevant

Using bundled or distribution-provided LMDB library?

Irrelevant

Distribution name and LMDB library version

Irrelevant

Machine "free -m" output

Irrelevant

Other important machine info

Irrelevant

Describe Your Problem

py-lmdb does not build against Python 3.13 because it uses PyObject_AsReadBuffer, which was removed in Python 3.13. The recommended replacement is PyObject_GetBuffer and PyBuffer_Release.

Errors/exceptions Encountered

    lmdb/cpython.c: In function ‘val_from_buffer’:
    lmdb/cpython.c:586:12: error: implicit declaration of function ‘PyObject_AsReadBuffer’; did you mean ‘PyObject_GetBuffer’? [-Wimplicit-function-declaration]
      586 |     return PyObject_AsReadBuffer(buf,
          |            ^~~~~~~~~~~~~~~~~~~~~
          |            PyObject_GetBuffer

Describe What You Expected To Happen

Successful build.

Describe What Happened Instead

Failed build.

@AdamWill
Copy link
Author

AdamWill commented Jun 14, 2024

So I can nearly fix this, but I'm struggling with the lifecycle of the view buffer PyObject_GetBuffer uses. This patch works, but obviously leaks those views:

diff --git a/lmdb/cpython.c b/lmdb/cpython.c
index ef30447..78838b9 100644
--- a/lmdb/cpython.c
+++ b/lmdb/cpython.c
@@ -583,9 +583,15 @@ val_from_buffer(MDB_val *val, PyObject *buf)
         type_error("Won't implicitly convert Unicode to bytes; use .encode()");
         return -1;
     }
-    return PyObject_AsReadBuffer(buf,
-        (const void **) &val->mv_data,
-        (Py_ssize_t *) &val->mv_size);
+    Py_buffer view;
+    int ret;
+    ret = PyObject_GetBuffer(buf, &view, PyBUF_SIMPLE);
+    if(ret == 0) {
+        val->mv_data = view.buf;
+        val->mv_size = view.len;
+    }
+    //PyBuffer_Release (&view);
+    return ret;
 }
 
 /* ------------------- */

If you uncomment the PyBuffer_Release, it crashes, because we aren't actually reading the data here, just setting a pointer to it.

I can't figure out a good way to handle this, because there are a lot of things that call val_from_buffer and they tend not to read the result right away, there's a lot of nesting going on. e.g. parse_args calls parse_arg which calls val_from_buffer, and dozens of things call parse_args and use the data at varying points afterwards...so what? Do we have every single one of those pass in a view and free it when they're done, and pass that view all the way down to val_from_buffer? That feels very icky, but I don't know what would be better. Note I am not any kind of C expert at all, maybe this is obvious to someone else.

@jnwatson
Copy link
Owner

Thanks for the heads up about 3.13. I haven't looked at that yet.

I'm afraid val_from_buffer and its callers will have to be restructured. The first approach I'd try is for the new val_from_buffer to takes a Py_buffer ** parameter that the created buffer is passed back into. Then for every caller of val_from_buffer there'd need to be a val_from_buffer_free that calls PyBuffer_Release on it if it isn't null.

This is a hard API break, and it probably means that the version that implements this will be the first version that doesn't support Python 2.7.

You're welcome to submit a PR that implements this. If not, I'll work on it after I have my current 3.12 branch working.

@AdamWill
Copy link
Author

Thanks!

That's more or less what I was talking about at the end of my message, I think. But - if I'm understanding things correctly - it's difficult because it doesn't seem like we're always done using the data even within the scope of the immediate caller of val_from_buffer. Like in the parse_arg case - parse_arg calls val_from_buffer, but it doesn't really use the result itself, the thing that uses the result is two or more links further up the chain. So you wind up having to pass the Py_buffer an awfully long way to get it from the place it will ultimately need to be freed all the way to val_from_buffer.

That seemed like a lot of work that would be very easy to get wrong for me, since I don't know the codebase and I'm not at all a C hacker, so I passed on trying to write a PR, sorry!

@jnwatson
Copy link
Owner

NP at all. I appreciate the report.

@hroncok

This comment was marked as resolved.

@hroncok

This comment was marked as resolved.

@hroncok
Copy link

hroncok commented Sep 4, 2024

Taking the patch by @AdamWill and not makign it crash is a matter of never releasing the buffer when the ret is not 0:

--- a/lmdb/cpython.c
+++ b/lmdb/cpython.c
@@ -583,9 +583,15 @@ val_from_buffer(MDB_val *val, PyObject *buf)
         type_error("Won't implicitly convert Unicode to bytes; use .encode()");
         return -1;
     }
-    return PyObject_AsReadBuffer(buf,
-        (const void **) &val->mv_data,
-        (Py_ssize_t *) &val->mv_size);
+    Py_buffer view;
+    int ret;
+    ret = PyObject_GetBuffer(buf, &view, PyBUF_SIMPLE);
+    if(ret == 0) {
+        val->mv_data = view.buf;
+        val->mv_size = view.len;
+        PyBuffer_Release (&view);
+    }
+    return ret;
 }

It does exactly the same thing as Python, so calling PyBuffer_Release before reading the buf and len is as safe as it ever was (i.e. it is not safe, which is why Python removed the function in the first place, but it does not make things worse).

hroncok added a commit to hroncok/py-lmdb that referenced this issue Sep 4, 2024
Avoid using PyObject_AsReadBuffer, do what PyObject_AsReadBuffer does.

Note that this is not safe, and it has never been safe,
but the code does *exactly* what it used to do (at least on Python 3.8+).

Fixes jnwatson#362

Co-Authored-By: Miro Hrončok <[email protected]>
@hroncok hroncok linked a pull request Sep 4, 2024 that will close this issue
@hroncok
Copy link

hroncok commented Sep 4, 2024

Fix in #368

@Dobatymo
Copy link

Dobatymo commented Nov 13, 2024

For me the current pypi version 1.5.1 builds on Github actions with

  • ubuntu-20.04, 3.13
  • ubuntu-22.04, 3.13
  • macos-12, 3.13

but not with

  • macos-13, 3.13
  • macos-14, 3.13

I can see the API was removed. So I don't really understand why it still builds on Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants