From 64c88b37c9ae519b4775967142fb2ab5e6d98c93 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Thu, 1 May 2025 17:01:02 -0400 Subject: [PATCH 01/31] Clarify what 'native thread' means. --- peps/pep-0788.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 0f5b58e1804..a58a572b086 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -16,7 +16,8 @@ Abstract :c:func:`PyGILState_Ensure`, :c:func:`PyGILState_Release`, and other related functions in the ``PyGILState`` family are the most common way to create -native threads that interact with Python. They have been the standard for over +native threads (as in, created using the C API instead of :mod:`threading`) +that interact with Python. They have been the standard for over twenty years (:pep:`311`). But, over time, these functions have become problematic: From bda3db1ee8d788dcd7bd5ff922c829b9b14acbd4 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Thu, 1 May 2025 17:10:02 -0400 Subject: [PATCH 02/31] Add a section clarifying finalization and change up some wording. --- peps/pep-0788.rst | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index a58a572b086..d32233a9c68 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -85,6 +85,25 @@ like this: Py_RETURN_NONE; } +What does "finalization" really mean? +------------------------------------- + +Throughout this PEP, the phrase "finalization" is used in reference to the +"finalizing" state of an interpreter. But, there's different stages of how +Python shuts down, so there's some ambiguity in that term. + +There are two ways to "finalize" in the C API: + +1. :c:func:`Py_FinalizeEx`, which finalizes the main interpreter (and + subsequently the rest of the runtime). +2. :c:func:`Py_EndInterpreter`, which finalizes a subinterpreter. + This does most of the same things that :c:func:`Py_FinalizeEx` + does to the main interpreter. + +So, "finalization" in this PEP refers to finalization of a specific +interpreter, *not* the entire runtime. (But, keep in mind that finalization +of the main interpreter and runtime are similiar states.) + Motivation ========== @@ -92,9 +111,10 @@ Native threads will always hang during finalization --------------------------------------------------- Many codebases might need to call Python code in highly-asynchronous -situations where the interpreter is already finalizing, or might finalize, and -want to continue running code after the Python call. This desire has been -`brought up by users `_. +situations where the desired interpreter +(:ref:`typically the main interpreter `) +could be finalizing or deleted, but want to continue running code after the +invoking the interpreter. This desire has been `brought up by users `_. For example, a callback that wants to call Python code might be invoked when: - A kernel has finished running on a GPU. @@ -102,7 +122,7 @@ For example, a callback that wants to call Python code might be invoked when: - A thread has quit, and a native library is executing static finalizers of thread local storage. -In the current C API, any non-Python thread (one not created via the +In the current C API, any "native" thread (one not created via the :mod:`threading` module) is considered to be "daemon", meaning that the interpreter won't wait on that thread to finalize. Instead, the interpreter will hang the thread when it goes to :term:`attach ` a :term:`thread state`, @@ -233,6 +253,8 @@ ackwards-compatible by simply removing that limitation: threads still need a thread state (and thus need to call :c:func:`PyGILState_Ensure`), but they don't need to wait on one another to do so. +.. _pep-788-subinterpreters-gilstate: + Subinterpreters don't work with ``PyGILState_Ensure`` ----------------------------------------------------- From a57686ccadc07598fd6a3ed8d2813cb176bb4c50 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sat, 3 May 2025 08:38:11 -0400 Subject: [PATCH 03/31] Rewrite the abstract. --- peps/pep-0788.rst | 131 +++++++++++++++------------------------------- 1 file changed, 42 insertions(+), 89 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index d32233a9c68..17c45da6a71 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -14,95 +14,48 @@ Post-History: `10-Mar-2025 `__, Abstract ======== -:c:func:`PyGILState_Ensure`, :c:func:`PyGILState_Release`, and other related -functions in the ``PyGILState`` family are the most common way to create -native threads (as in, created using the C API instead of :mod:`threading`) -that interact with Python. They have been the standard for over -twenty years (:pep:`311`). But, over time, these functions have -become problematic: - -- They aren't safe for finalization, either causing the calling thread to hang or - crashing it with a segmentation fault, preventing further execution. -- When they're called before finalization, they force the thread to be - "daemon", meaning that an interpreter won't wait for it to reach any point - of execution. This is mostly frustrating for developers, but can lead to - deadlocks! -- Subinterpreters don't play nicely with them, because they all assume that - the main interpreter is the only one that exists. A fresh thread (that is, - has never had a thread state) that calls :c:func:`PyGILState_Ensure` will - always be for the main interpreter. -- The term "GIL" in the name is quite confusing for users of free-threaded - Python. There isn't a GIL, why do they still have to call it? - -This PEP intends to fix all of these issues by providing two new functions, -:c:func:`PyThreadState_Ensure` and :c:func:`PyThreadState_Release`, as a more -correct and safer replacement for :c:func:`PyGILState_Ensure` and -:c:func:`PyGILState_Release`. For example: - -.. code-block:: c - - if (PyThreadState_Ensure(interp) < 0) { - fputs("Python is shutting down", stderr); - return; - } - - /* Interact with Python, without worrying about finalization. */ - // ... - - PyThreadState_Release(); - -This is achieved by introducing two concepts into the C API: - -- "Daemon" and "non-daemon" threads, similar to how it works in the - :mod:`threading` module. -- Interpreter reference counts which prevent an interpreter from finalizing. - -In :c:func:`PyThreadState_Ensure`, both of these ideas are applied. The -calling thread is to store a reference to an interpreter via -:c:func:`PyInterpreterState_Hold`. :c:func:`PyInterpreterState_Hold` -increases the reference count of an interpreter, requiring the thread -to finish (by eventually calling :c:func:`PyThreadState_Release`) before -beginning finalization. - -For example, creating a native thread with this API would look something -like this: - -.. code-block:: c - - static PyObject * - my_method(PyObject *self, PyObject *unused) - { - PyThread_handle_t handle; - PyThead_indent_t indent; - - PyInterpreterState *interp = PyInterpreterState_Hold(); - if (PyThread_start_joinable_thread(thread_func, interp, &ident, &handle) < 0) { - PyInterpreterState_Release(interp); - return NULL; - } - /* The thread will always attach and finish, because we increased - the reference count of the interpreter. */ - Py_RETURN_NONE; - } - -What does "finalization" really mean? -------------------------------------- - -Throughout this PEP, the phrase "finalization" is used in reference to the -"finalizing" state of an interpreter. But, there's different stages of how -Python shuts down, so there's some ambiguity in that term. - -There are two ways to "finalize" in the C API: - -1. :c:func:`Py_FinalizeEx`, which finalizes the main interpreter (and - subsequently the rest of the runtime). -2. :c:func:`Py_EndInterpreter`, which finalizes a subinterpreter. - This does most of the same things that :c:func:`Py_FinalizeEx` - does to the main interpreter. - -So, "finalization" in this PEP refers to finalization of a specific -interpreter, *not* the entire runtime. (But, keep in mind that finalization -of the main interpreter and runtime are similiar states.) +In Python, threads are able to interact with an interpreter (e.g., invoke the +bytecode loop) through an :term:`attached thread state`. On with-GIL builds, +only one thread can hold an attached thread state at once, which means that +the thread holds the :term:`GIL`. On free-threaded builds, there can be +infinitely many thread states attached, allowing for parallelism (because +multiple threads can invoke the interpreter at once). + +With that in mind, attachment of thread states is a bit problematic in the C API. +The C API currently provides two ways to acquire and attach a thread state for +an interpreter: + +- :c:func:`PyGILState_Ensure` & :c:func:`PyGILState_Release`. +- :c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap` (significantly + less common). + +The former, ``PyGILState``, are the most common way to do this and have been +the standard for over twenty years (:pep:`311`), but have a number of issues +that have arisen over time: + +- Subinterpreters tend to have trouble with them, because in threads that + haven't ever had an attached thread state, :c:func:`PyGILState_Ensure` + will assume that the main interpreter was requested. This makes it + impossible for the thread to interact with the subinterpreter! +- The phrase "GIL" is confusing for developers of free-threaded + extensions, because there's no GIL there, right? Even on free-threaded + builds, threads still needs a thread state to interact with the interpreter, + it's just that they don't have to wait on one-another to do so. These days, + the important thing that :c:func:`PyGILState_Ensure` does is get attach a + thread state, and acquiring the GIL is somewhat incidental. + +The other option, :c:func:`PyThreadState_New` and :c:func:`PyThreadState_Swap`, +do solve those issues, but come with an additional problem with how thread state +attachment works in the C API (that ``PyGILState`` also includes): if the +thread is not the main thread, then the interpreter will randomly hang the +thread during attachment if it starts finalizing. This can be frustrating, +especially if there was some additional work to be done alongside invoking +Python. + +This PEP intends to solve these issues by providing :c:func:`PyThreadState_Ensure` +and :c:func:`PyThreadState_Ensure` as replacements for the existing functions, +accompanied by some interpreter reference counting APIs that let thread states +be acquired and attached in a thread-safe and predictable manner. Motivation ========== From 3387f814bfaed7f8ed254329f8fd5d0dd999e78b Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sat, 3 May 2025 09:32:29 -0400 Subject: [PATCH 04/31] A bunch of changes to the motivation and rationale. --- peps/pep-0788.rst | 211 +++++++++++++++++++++++++++------------------- 1 file changed, 125 insertions(+), 86 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 17c45da6a71..8489d44c0e1 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -67,7 +67,8 @@ Many codebases might need to call Python code in highly-asynchronous situations where the desired interpreter (:ref:`typically the main interpreter `) could be finalizing or deleted, but want to continue running code after the -invoking the interpreter. This desire has been `brought up by users `_. +invoking the interpreter. This desire has been +`brought up by users `_. For example, a callback that wants to call Python code might be invoked when: - A kernel has finished running on a GPU. @@ -75,15 +76,33 @@ For example, a callback that wants to call Python code might be invoked when: - A thread has quit, and a native library is executing static finalizers of thread local storage. +Generally, this pattern would look something like this: + +.. code-block:: c + + static void + some_callback(void *closure) + { + /* Do some work */ + /* ... */ + + PyGILState_STATE gstate = PyGILState_Ensure(); + /* Invoke the C API to do some computation */ + PyGILState_Release(gstate); + + /* ... */ + } + In the current C API, any "native" thread (one not created via the :mod:`threading` module) is considered to be "daemon", meaning that the interpreter won't wait on that thread to finalize. Instead, the interpreter will hang the thread when it goes to :term:`attach ` a :term:`thread state`, making it unusable past that point. Attaching a thread state can happen at -any point when invoking Python, such as releasing the GIL in-between bytecode -instructions, or when a C function exits a :c:macro:`Py_BEGIN_ALLOW_THREADS` -block. (Note that hanging the thread is relatively new behavior; in prior -versions, the thread would terminate, but the issue is the same.) +any point when invoking Python, such as releasing it in-between bytecode +instructions (to yield the GIL), or when a C function exits a +:c:macro:`Py_BEGIN_ALLOW_THREADS` block. (Note that hanging the thread is +relatively new behavior; in prior versions, the thread would terminate, but +the issue is the same.) This means that any non-Python thread may be terminated at any point, which is severely limiting for users who want to do more than just execute Python @@ -105,8 +124,8 @@ the thread: Unfortunately, this isn't correct, because of time-of-call to time-of-use issues; the interpreter might not be finalizing during the call to -:c:func:`Py_IsFinalizing`, but it might start finalizing immediately afterwards, which -would cause the attachment of a thread state (typically via +:c:func:`Py_IsFinalizing`, but it might start finalizing immediately +afterwards, which would cause the attachment of a thread state (typically via :c:func:`PyGILState_Ensure`) to hang the thread. Daemon threads can cause finalization deadlocks @@ -114,9 +133,16 @@ Daemon threads can cause finalization deadlocks When acquiring locks, it's extremely important to detach the thread state to prevent deadlocks. This is true on both the with-GIL and free-threaded builds. + When the GIL is enabled, a deadlock can occur pretty easily when acquiring a -lock if the GIL wasn't released, and lock-ordering deadlocks can still occur -free-threaded builds if the thread state wasn't detached. +lock if the GIL wasn't released; thread A grabs a lock, and starts waiting on +its thread state to attach, while thread B holds the GIL and is waiting on the +lock. + +On free-threaded builds, lock-ordering deadlocks are still possible +if thread A acquired the lock for object A and then object B, and then +another thread tried to acquire those locks in a reverse order. Free-threading +protects against this by releasing locks when the thread state is detached. So, all code that needs to work with locks need to detach the thread state. In C, this is almost always done via :c:macro:`Py_BEGIN_ALLOW_THREADS` and @@ -138,9 +164,7 @@ though. If any of those finalizers try to acquire the lock, deadlock ensues. This affects CPython itself, and there's not much that can be done to fix it. For example, `python/cpython#129536 `_ remarks that the :mod:`ssl` module can emit a fatal error when used at -finalization, because a daemon thread got hung while holding the lock. There -are workarounds for this for pure-Python code, but native threads don't have -such an option. +finalization, because a daemon thread got hung while holding the lock. .. _pep-788-hanging-compat: @@ -148,12 +172,12 @@ We can't change finalization behavior for ``PyGILState_Ensure`` *************************************************************** There will always have to be a point in a Python program where -:c:func:`PyGILState_Ensure` can no longer acquire the GIL (or more correctly, -attach a thread state). If the interpreter is long dead, then Python -obviously can't give a thread a way to invoke it. -:c:func:`PyGILState_Ensure` doesn't have any meaningful way to return a -failure, so it has no choice but to terminate the thread or emit a fatal -error, as noted in `python/cpython#124622 `_: +:c:func:`PyGILState_Ensure` can no longer attach a thread state. +If the interpreter is long dead, then Python obviously can't give a +thread a way to invoke it. :c:func:`PyGILState_Ensure` doesn't have any +meaningful way to return a failure, so it has no choice but to terminate +the thread or emit a fatal error, as noted in +`python/cpython#124622 `_: I think a new GIL acquisition and release C API would be needed. The way the existing ones get used in existing C code is not amenible to suddenly @@ -163,9 +187,7 @@ error, as noted in `python/cpython#124622 `_ -that could, in theory, be fixed in CPython, but it's definitely worth noting +that could be fixed in CPython, but it's definitely worth noting here. Incidentally, acceptance and implementation of this PEP will likely fix the existing crashes caused by :c:func:`PyGILState_Ensure`. @@ -198,13 +220,29 @@ created by the authors of this PEP: erroneously call the C API inside ``Py_BEGIN_ALLOW_THREADS`` blocks or omit ``PyGILState_Ensure`` in fresh threads. -Since Python 3.12, it is an :term:`attached thread state` that lets a thread -invoke the C API. On with-GIL builds, holding an attached thread state -implies holding the GIL, so only one thread can have one at a time. Free-threaded -builds achieve the effect of multi-core parallism while remaining -ackwards-compatible by simply removing that limitation: threads still need a -thread state (and thus need to call :c:func:`PyGILState_Ensure`), but they -don't need to wait on one another to do so. +Again, :c:func:`PyGILState_Ensure` gets an :term:`attached thread state` +for the thread on both with-GIL and free-threaded builds. Acquisition of the +GIL on with-GIL builds is incidental! :c:func:`PyGILState_Ensure` is very +roughly equivalent to the following: + +.. code-block:: c + + PyGILState_STATE + PyGILState_Ensure(void) + { + PyThreadState *existing = PyThreadState_GetUnchecked(); + if (existing == NULL) { + // Chooses the interpreter of the last attached thread state + // for this thread. If Python has never ran in this thread, the + // main interpreter is used. + PyInterpreterState *interp = guess_interpreter(); + PyThreadState *tstate = PyThreadState_New(interp); + PyThreadState_Swap(tstate); + return opaque_tstate_handle(tstate); + } else { + return opaque_tstate_handle(existing); + } + } .. _pep-788-subinterpreters-gilstate: @@ -220,13 +258,20 @@ As noted in the :ref:`documentation `, ``Py_NewInterpreter()``), but mixing multiple interpreters and the ``PyGILState_*`` API is unsupported. -More technically, this is because ``PyGILState_Ensure`` doesn't have any way +This is because :c:func:`PyGILState_Ensure` doesn't have any way to know which interpreter created the thread, and as such, it has to assume that it was the main interpreter. There isn't any way to detect this at runtime, so spurious races are bound to come up in threads created by subinterpreters, because synchronization for the wrong interpreter will be used on objects shared between the threads. +For example, if the thread had access to object A, which belongs to a +subinterpreter, but then called :c:func:`PyGILState_Ensure` would have an +attached thread state pointing to the main interpreter, not the subinterpreter. +This means that any GIL assumptions about the object are wrong! There isn't +any synchronization between the two GILs, so both the thread (who thinks it's +in the subinterpreter) and the main thread could try to increment the +reference count at the same time, causing a data race! Interpreters can concurrently shut down *************************************** @@ -234,22 +279,61 @@ Interpreters can concurrently shut down The other way of creating a native thread that can invoke Python, :c:func:`PyThreadState_New` / :c:func:`PyThreadState_Swap`, is a lot better for supporting subinterpreters (because :c:func:`PyThreadState_New` takes an -explicit interpreter, rather than assuming that the main interpreter was intended), -but is still limited by the current API. +explicit interpreter, rather than assuming that the main interpreter was +requested), but is still limited by the current hanging problems in the C API. -In particular, subinterpreters typically have a much shorter lifetime than the -main interpreter, and as such, there's not necessarily a guarantee that a -:c:type:`PyInterpreterState` (acquired by :c:func:`PyInterpreterState_Get`) -passed to a fresh thread will still be alive. Similarly, a -:c:type:`PyInterpreterState` pointer could have been replaced with a *new* -interpreter, causing all sorts of unknown issues. They are also subject to -all the finalization related hanging mentioned previously. +In addition, subinterpreters typically have a much shorter lifetime than the +main interpreter, so there's a much higher chance that an interpreter passed +to a thread will have already finished and have been deallocated. Passing that +interpreter to :c:func:`PyThreadState_New` will most likely crash the program. Rationale ========= -This PEP includes several new APIs that intend to fix all of the issues stated -above. +So, how do we address all of this? The best way seems to be starting from +scratch and "reimagining" how to acquire and attach thread states in the C API. + +As a summary, there's a few bases we want to cover in a new API: + +- Require the caller to specify which interpreter they want to prevent those + pesky problems with interpreter guessing. +- Prevent the thread from being arbitrarily bricked by calling into Python. +- Protection against deallocation on interpreters with short lifetimes. +- Backwards-compatibility with the old APIs and ideas, such as "daemonness" + (but as opt-in). + +Preventing interpreter finalization with references +--------------------------------------------------- + +This PEP takes an approach where interpreters are given a reference count by +non-daemon threads that want to (or do) hold an attached thread state. When +the interpreter starts finalizing, it will until its reference count +reaches zero before proceeding to a point where threads will be hung. +Note that this *is not* the same as joining the thread; the interpreter will +only wait until the thread state has been released +(via :c:func:`PyThreadState_Release`) for all non-daemon threads. This isn't +the same as waiting for them to detach their thread state--it waits for them +to *destroy* it. Otherwise, this API wouldn't have any finalization benefits +over the existing ``PyThreadState`` functions. + +So, from a thread's perspective, holding a "strong reference" to the +interpreter will effectively prevent it from finalizing, making it safe to +invoke Python without worrying about the thread being hung. The strong +reference will be held as long as thread state is "alive", even if it's +detached. + +This proposal also comes with weak references to an interpreter that don't +prevent it from finalizing, but can be promoted to a strong reference once +decided that a thread state can attach. Promotion of a weak reference to a +strong reference can fail if the interpreter has already finalized, or reached +a point during finalization where it can't be guaranteed that the thread won't +hang. + +If there's additional work after destroying the thread state, the thread +can continue running as normal. If that work needs to finish before the +program exits, it's still up to the user on how to join the thread, for +example by using an :mod:`atexit` handler can be used to join the thread. +Again, this PEP isn't trying to reinvent how to create or join threads! Replacing the old APIs ---------------------- @@ -272,51 +356,6 @@ seamless, due to the new requirement of storing an interpreter state. The exact details of this deprecation are currently unclear, see :ref:`pep-788-deprecation`. -A light layer of magic ----------------------- - -The APIs proposed by this PEP intentionally have a layer of abstraction that is -hidden from the user and offloads complexity onto CPython. This is done -primarily to help ease the transition from ``PyGILState`` for existing -codebases, and for ease-of-use to those who provide wrappers the C API, such -as Cython or PyO3. - -In particular, the API hides details about the lifetime of the thread state -and most of the details with interpreter references. - -See also :ref:`pep-788-activate-deactivate-instead`. - -Bikeshedding and the ``PyThreadState`` namespace ------------------------------------------------- - -To solve the issue with "GIL" terminology, the new functions described by this -PEP intended as replacements for ``PyGILState`` will go under the existing -``PyThreadState`` namespace. In Python 3.14, the documentation has been -updated to switch over to terms like -:term:`"attached thread state" ` instead of -:term:`"global interpreter lock" `, so this namespace -seems to fit well for this PEP. - -Preventing interpreter finalization with references ---------------------------------------------------- - -Several iterations of this API have taken an approach where -:c:func:`PyThreadState_Ensure` can return a failure based on the state of -the interpreter. Instead, this PEP takes an approach where an interpreter -keeps track of the number of non-daemon threads, which inherently prevents -it from beginning finalization. - -The main upside with this approach is that there's more consistency with -attaching threads. Using an interpreter reference from the calling thread -keeps the interpreter from finalizing before the thread starts, ensuring -that it always works. An approach that were to return a failure based on -the start-time of the thread could cause spurious issues. - -In the case where it is useful to let the interpreter finalize, such as in -an asynchronous callback where there's no guarantee that the thread will start, -strong references to an interpreter can be acquired through -:c:func:`PyInterpreterState_Lookup`. - Specification ============= From ceeefeaf5f9f4bd5a726e27c18f1afa94801cb95 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sat, 3 May 2025 09:40:48 -0400 Subject: [PATCH 05/31] Add PyThreadState_GetDaemon() and reword the deprecation rationale. --- peps/pep-0788.rst | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 8489d44c0e1..b244fd1e887 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -338,12 +338,10 @@ Again, this PEP isn't trying to reinvent how to create or join threads! Replacing the old APIs ---------------------- -As made clear in Motivation_, ``PyGILState`` is already pretty buggy, and -even if it was magically fixed, the current behavior of hanging the thread is -beyond repair. In turn, this PEP intends to completely deprecate the existing -``PyGILState`` APIs and provide better alternatives. However, even if this PEP -is rejected, all of the APIs can be replaced with more correct ``PyThreadState`` -functions in the current C API: +Due to the plethora of issues with ``PyGILState``, this PEP intends to do away +with them entirely. In today's C API, all ``PyGILState`` functions are +replaceable with ``PyThreadState`` counterparts that are compatibile with +subinterpreters: - :c:func:`PyGILState_Ensure`: :c:func:`PyThreadState_Swap` & :c:func:`PyThreadState_New` - :c:func:`PyGILState_Release`: :c:func:`PyThreadState_Clear` & :c:func:`PyThreadState_Delete` @@ -351,10 +349,12 @@ functions in the current C API: - :c:func:`PyGILState_Check`: ``PyThreadState_GetUnchecked() != NULL`` This PEP specifies a ten-year deprecation for these functions (while remaining -in the stable ABI), primarily because it's expected that the migration won't be -seamless, due to the new requirement of storing an interpreter state. The -exact details of this deprecation are currently unclear, see -:ref:`pep-788-deprecation`. +in the stable ABI), mainly because it's expected that the migration will be a +little painful, because :c:func:`PyThreadState_Ensure` and +:c:func:`PyThreadState_Release` aren't drop-in replacements for +:c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`, due to the +requirement of a specific interpreter. The exact details of this deprecation +aren't too clear, see :ref:`pep-788-deprecation`. Specification ============= @@ -391,6 +391,15 @@ See :ref:`pep-788-hanging-compat`. Return zero on success, non-zero *without* an exception set on failure. +.. c:function:: int PyThreadState_GetDaemon(int is_daemon) + + Returns non-zero if the :term:`attached thread state` is daemon, + and zero otherwise. See also and :c:func:`PyThreadState_SetDaemon` + and :attr:`threading.Thread.daemon`. + + This function cannot fail, other than with a fatal error if the caller + has no :term:`attached thread state`. + Interpreter reference counting ------------------------------ From 3cbfb261c385c4cf0f87deabf1eeb34afb66ed69 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sat, 3 May 2025 11:21:30 -0400 Subject: [PATCH 06/31] Rewrite the entire damn specification. --- peps/pep-0788.rst | 322 +++++++++++++++++++++++++++------------------- 1 file changed, 187 insertions(+), 135 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index b244fd1e887..d68f729bac0 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -359,19 +359,115 @@ aren't too clear, see :ref:`pep-788-deprecation`. Specification ============= +Interpreter reference counting +------------------------------ + +An interpreter will keep track of the number of non-daemon threads through +a reference count. During finalization, the interpreter will wait until its +reference count reaches zero, and once that happens, threads can no longer +acquire a strong reference to the interpreter. Threads can hold as many +references as they want, but in most cases, a thread will have one reference +at a time, typically through the :term:`attached thread state`. + +An attached thread state is made non-daemon by holding a strong reference +to the interpreter. When a non-daemon thread state is destroyed, it releases +the reference. + +A weak reference to the interpreter won't prevent it from finalizing, but can +be safely accessed after the interpreter no longer supports strong references, +and even after the interpreter has been deleted. But, at that point, the weak +reference can no longer be converted to a strong reference. + +Strong interpreter references +***************************** + +.. c:type:: PyInterpreterRef + + An opaque, strong reference to an interpreter. + The interpreter will wait until a strong reference has been released + before shutting down. + +.. c:function:: PyInterpreterRef PyInterpreterRef_Get(void) + + Acquire a strong reference to the current interpreter. + + This function is generally meant to be used in tandem with + :c:func:`PyThreadState_Ensure`. + + This function cannot fail, other than with a fatal error when the caller + doesn't hold an :term:`attached thread state`. + +.. c:function:: PyInterpreterRef PyInterpreterRef_Dup(PyInterpreterRef ref) + + Duplicate a strong reference to an interpreter. + + This function is generally meant to be used in tandem with + :c:func:`PyThreadState_Ensure`. + + This function cannot fail, and the caller doesn't need to hold an + :term:`attached thread state`. + +.. c:function:: void PyInterpreterRef_Close(PyInterpreterRef ref) + + Release a strong reference to an interpreter, allowing it to shut down + if there are no references left. + + This function cannot fail, and the caller doesn't need to hold an + :term:`attached thread state`. + +Weak interpreter references +*************************** + +.. c:type:: PyInterpreterWeakRef + + An opaque, weak reference to an interpreter. + The interpreter will *not* wait for the reference to be + released before shutting down. + +.. c:function:: PyInterpreterWeakRef *PyInterpreterWeakRef_Get(void) + + Acquire a weak reference to the current interpreter. + + This function is generally meant to be used in tandem with + :c:func:`PyInterpreterWeakRef_AsStrong`. + + This function cannot fail, other than with a fatal error when the caller + doesn't hold an :term:`attached thread state`. + +.. c:function:: PyInterpreterWeakRef *PyInterpreterWeakRef_Dup(PyInterpreterWeakRef wref) + + Duplicate a weak reference to *wref*. + + This function is generally meant to be used in tandem with + :c:func:`PyInterpreterWeakRef_AsStrong`. + + This function cannot fail, and the caller doesn't need to hold an + :term:`attached thread state`. + +.. c:function:: PyInterpreterRef PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef *wref) + + Return a strong reference to an interpreter from a weak reference. + + If the interpreter no longer exists or has already finished waiting for + non-daemon threads, then this function returns ``NULL``. + + The caller does not need to hold an :term:`attached thread state`, but is + not safe to call in a re-entrant signal handler. + +.. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef *wref) + + Release a weak reference, possibly deallocating it. + + This function cannot fail, and the caller doesn't need to hold an + :term:`attached thread state`. + Daemon and non-daemon threads ----------------------------- -This PEP introduces the concept of non-daemon thread states. By default, all -threads created without the :mod:`threading` module will hang when trying to -attach a thread state for a finalizing interpreter (in fact, daemon threads -that *are* created with the :mod:`threading` module will hang in the same -way). This generally happens when a thread calls :c:func:`PyEval_RestoreThread` -or in between bytecode instructions, based on :func:`sys.setswitchinterval`. - -A new, internal field will be added to the ``PyThreadState`` structure that -determines if the thread is daemon. Before finalization, an interpreter -will wait until all non-daemon threads call :c:func:`PyThreadState_Delete`. +A non-daemon thread state is a thread state that holds a strong reference to an +interpreter. The reference is released when the thread state is deleted, either +by :c:func:`PyThreadState_Release` or a different thread state deletion +function. For backwards compatibility, all thread states created by existing APIs, including :c:func:`PyGILState_Ensure`, will remain daemon by default. @@ -386,10 +482,12 @@ See :ref:`pep-788-hanging-compat`. :c:func:`PyThreadState_Ensure` are daemon by default. If the thread state is non-daemon, then the current interpreter will wait - for this thread to finish before shutting down. See also + for this thread to finish before shutting down by holding a strong + reference to the interpreter (see :c:func:`PyInterpreterRef_Get`). See also :attr:`threading.Thread.daemon`. Return zero on success, non-zero *without* an exception set on failure. + This function can only fail when setting the thread state to non-daemon. .. c:function:: int PyThreadState_GetDaemon(int is_daemon) @@ -400,102 +498,77 @@ See :ref:`pep-788-hanging-compat`. This function cannot fail, other than with a fatal error if the caller has no :term:`attached thread state`. -Interpreter reference counting ------------------------------- - -Internally, an interpreter will have to keep track of the number of -non-daemon native threads, which will determine when an interpreter can -finalize. This is done to prevent use-after-free crashes in -:c:func:`PyThreadState_Ensure` for interpreters with short lifetimes, and -to remove needless layers of synchronization between the calling thread and -the started thread. - -An interpreter state returned by :c:func:`Py_NewInterpreter` (or really, -:c:func:`PyInterpreterState_New`) will start with a native thread countdown. -For simplicity's sake, this will be referred to as a reference count. -A non-zero reference count prevents the interpreter from finalizing. - -.. c:function:: PyInterpreterState *PyInterpreterState_Hold(void) - - Similar to :c:func:`PyInterpreterState_Get`, but returns a strong - reference to the interpreter (meaning, it has its reference count - incremented by one, allowing the returned interpreter state to be safely - accessed by another thread, because it will be prevented from finalizing). - - This function is generally meant to be used in tandem with - :c:func:`PyThreadState_Ensure`. - - The caller must have an :term:`attached thread state`. This function - cannot return ``NULL``. Failures are always a fatal error. - -.. c:function:: PyInterpreterState *PyInterpreterState_Lookup(int64_t interp_id) - - Similar to :c:func:`PyInterpreterState_Hold`, but looks up an interpreter - based on an ID (see :c:func:`PyInterpreterState_GetID`). This has the - benefit of allowing the interpreter to finalize in cases where the thread - might not start, such as inside of an asynchronous callback. - - This function will return ``NULL`` without an exception set on failure. - If the return value is non-``NULL``, then the returned interpreter will be - prevented from finalizing until the reference is released by - :c:func:`PyThreadState_Release` or :c:func:`PyInterpreterState_Release`. - - Returning ``NULL`` typically means that the interpreter is at a point - where threads cannot start, or no longer exists. - - The caller does not need to have an :term:`attached thread state`. - -.. c:function:: void PyInterpreterState_Release(PyInterpreterState *interp) - - Decrement the reference count of the interpreter, as was incremented by - :c:func:`PyInterpreterState_Hold` or :c:func:`PyInterpreterState_Lookup`. - - This function cannot fail, other than with a fatal error. The caller does - not need to have an :term:`attached thread state` for *interp*. - Ensuring and releasing thread states ------------------------------------ This proposal includes two new high-level threading APIs that intend to replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`. -.. c:function:: int PyThreadState_Ensure(PyInterpreterState *interp) +.. c:function:: int PyThreadState_Ensure(PyInterpreterRef ref) + + Ensure that the thread has an :term:`attached thread state` for the + interpreter denoted by *ref*, and thus can safely invoke that + interpreter. It is OK to call this function if the thread already has an + attached thread state, as long as there is a subsequent call to + :c:func:`PyThreadState_Release` that matches this one. - Ensure that the thread has an :term:`attached thread state` for *interp*, - and thus can safely invoke that interpreter. It is OK to call this - function if the thread already has an attached thread state, as long as - there is a subsequent call to :c:func:`PyThreadState_Release` that matches - this one. + Nested calls to this function will only sometimes create a new + :term:`thread state`. If there is no :term:`attached thread state`, + then this function will check for the most recent attached thread + state used by this thread. If none exists or it doesn't match *ref*, + a new thread state is created. If it does match *ref*, it is reattached. + If there is an :term:`attached thread state`, then a similar check occurs; + if the interpreter matches *ref*, it is attached, and otherwise a new + thread state is created. - The reference to the interpreter *interp* is stolen by this function. - As such, *interp* should have been acquired by - :c:func:`PyInterpreterState_Hold`. + The thread state attached by this function will be reused by + subsequent calls to :c:func:`PyGILState_Ensure` in this thread, but + :c:func:`PyGILState_Ensure` will *not* make the thread daemon again. - Thread states created by this function are non-daemon by default. See - :c:func:`PyThreadState_SetDaemon`. If the calling thread already has an - attached thread state that matches *interp*, then this function - will mark the existing thread state as non-daemon and return. It will - be restored to its prior daemon status upon the next - :c:func:`PyThreadState_Release` call. + The reference to the interpreter *ref* is stolen by this function. + Use :c:func:`PyInterpreterRef_Dup` if the reference is intended to be + kept. Return zero on success, and non-zero with the old attached thread state restored (which may have been ``NULL``). .. c:function:: void PyThreadState_Release() - Release the :term:`attached thread state` set by - :c:func:`PyThreadState_Ensure`. Any thread state that was set prior - to the original call to :c:func:`PyThreadState_Ensure` will be restored. + Release a :c:func:`PyThreadState_Ensure` call. + + The :term:`attached thread state` prior to the corresponding + :c:func:`PyThreadState_Ensure` call is guaranteed to be restored upon + returning. The cached thread state as used by :c:func:`PyThreadState_Ensure` + and :c:func:`PyGILState_Ensure` will also be restored. This function cannot fail, but may hang the thread if the - attached thread state prior to the original :c:func:`!PyThreadState_Ensure` - was daemon and the interpreter was finalized. + restored :term:`attached thread state` was daemon and the interpreter + was finalized. If you're running in a thread where that could be an issue, + call :c:func:`PyThreadState_SetDaemon` before :c:func:`PyThreadState_Ensure` + at your own discretion. + +Changes to :mod:`threading` shutdown +------------------------------------ + +An interpreter currently special-cases non-daemon threads created by +:mod:`threading` and joins them before the interpreter does any other +finalization. + +:mod:`threading` will be changed to use :c:func:`PyThreadState_Ensure`, and +will rely on the interpreter's strong reference to run until completion. +:mod:`threading`-created threads will still be joined to release resources after +this has happened. + +Additionally, setting a :class:`threading.Thread` to :attr:`~threading.Thread.daemon` +should correspond to calling :c:func:`PyThreadState_SetDaemon` in C. Otherwise, +:c:func:`PyThreadState_GetDaemon` will have incorrect results in Python +threads. Deprecation of ``PyGILState`` APIs ---------------------------------- This PEP deprecates all of the existing ``PyGILState`` APIs in favor of the -new ``PyThreadState`` APIs for the reasons given in the Motivation_. Namely: +existing and new ``PyThreadState`` APIs. Namely: - :c:func:`PyGILState_Ensure`: use :c:func:`PyThreadState_Ensure` instead. - :c:func:`PyGILState_Release`: use :c:func:`PyThreadState_Release` instead. @@ -548,12 +621,11 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked! my_critical_operation(PyObject *self, PyObject *unused) { assert(PyThreadState_GetUnchecked() != NULL); - PyInterpreterState *interp = PyInterpreterState_Hold(); + PyInterpreterRef ref = PyInterpreterRef_Get(); /* Temporarily make this thread non-daemon to ensure that the lock is released. */ - if (PyThreadState_Ensure(interp) < 0) { - PyErr_SetString(PyExc_PythonFinalizationError, - "interpreter is shutting down"); + if (PyThreadState_Ensure(ref) < 0) { + PyErr_NoMemory(); return NULL; } @@ -561,7 +633,8 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked! acquire_some_lock(); Py_END_ALLOW_THREADS; - /* Do something while holding the lock */ + /* Do something while holding the lock. + The interpreter won't finalize during this period. */ // ... release_some_lock(); @@ -569,10 +642,10 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked! Py_RETURN_NONE; } -Transitioning from old functions -******************************** +Transitioning from the existing functions +***************************************** -The following code uses the old ``PyGILState`` APIs: +The following code uses the ``PyGILState`` APIs: .. code-block:: c @@ -606,16 +679,15 @@ The following code uses the old ``PyGILState`` APIs: Py_RETURN_NONE; } -This is the same code, updated to use the new functions: +This is the same code, rewritten to use the new functions: .. code-block:: c static int thread_func(void *arg) { - PyInterpreterState *interp = (PyInterpreterState *)arg; + PyInterpreterRefinterp = (PyInterpreterRef)arg; if (PyThreadState_Ensure(interp) < 0) { - fputs("Cannot talk to Python", stderr); return -1; } if (PyRun_SimpleString("print(42)") < 0) { @@ -631,9 +703,9 @@ This is the same code, updated to use the new functions: PyThread_handle_t handle; PyThead_indent_t indent; - PyInterpreterState *interp = PyInterpreterState_Hold(); - if (PyThread_start_joinable_thread(thread_func, interp, &ident, &handle) < 0) { - PyInterpreterState_Release(interp); + PyInterpreterRef ref = PyInterpreterRef_Get(); + if (PyThread_start_joinable_thread(thread_func, (void *)ref, &ident, &handle) < 0) { + PyInterpreterRef_Close(ref); return NULL; } Py_BEGIN_ALLOW_THREADS @@ -654,9 +726,8 @@ they can still be used with this API: static int thread_func(void *arg) { - PyInterpreterState *interp = (PyInterpreterState *)arg; - if (PyThreadState_Ensure(interp) < 0) { - fputs("Cannot talk to Python", stderr); + PyInterpreterRef ref = (PyInterpreterRef)arg; + if (PyThreadState_Ensure(ref) < 0) { return -1; } (void)PyThreadState_SetDaemon(1); @@ -673,9 +744,9 @@ they can still be used with this API: PyThread_handle_t handle; PyThead_indent_t indent; - PyInterpreterState *interp = PyInterpreterState_Hold(); - if (PyThread_start_joinable_thread(thread_func, interp, &ident, &handle) < 0) { - PyInterpreterState_Release(interp); + PyInterpreterRef ref = PyInterpreterRef_Get(); + if (PyThread_start_joinable_thread(thread_func, (void *)ref, &ident, &handle) < 0) { + PyInterpreterRef_Close(ref); return NULL; } Py_RETURN_NONE; @@ -684,35 +755,23 @@ they can still be used with this API: Asynchronous callback example ***************************** -As stated in the Motivation_, there are many cases where it's desirable -to call Python in an asynchronous callback. In such cases, it's not safe to -call :c:func:`PyInterpreterState_Hold`, because it's not guaranteed that -:c:func:`PyThreadState_Ensure` will ever be called. -If not, finalization becomes deadlocked. - -This scenario requires using :c:func:`PyInterpreterState_Lookup` instead, -which only prevents finalization once the lookup has been made. - -For example: +In some cases, the thread might not ever start, such as in a callback. +We can't use a strong reference here, because a strong reference would +deadlock the interpreter if it's not released. .. code-block:: c - typedef struct { - int64_t interp_id; - } pyrun_t; - static int async_callback(void *arg) { - pyrun_t *data = (pyrun_t *)arg; - PyInterpreterState *interp = PyInterpreterState_Lookup(data->interp_id); - PyMem_RawFree(data); - if (interp == NULL) { - fputs("Python has shut down", stderr); + PyInterpreterWeakRef *wref = (PyInterpreterWeakRef *)arg; + PyInterpreterRef *ref = PyInterpreterWeakRef_AsStrong(wref); + if (ref == NULL) { + fputs(stderr, "Python has shut down!"); return -1; } - if (PyThreadState_Ensure(interp) < 0) { - fputs("Cannot talk to Python", stderr); + + if (PyThreadState_Ensure(ref) < 0) { return -1; } if (PyRun_SimpleString("print(42)") < 0) { @@ -725,17 +784,10 @@ For example: static PyObject * setup_callback(PyObject *self, PyObject *unused) { - PyThread_handle_t handle; - PyThead_indent_t indent; - - pyrun_t *data = PyMem_RawMalloc(sizeof(pyrun_t)); - if (data == NULL) { - return PyErr_NoMemory(); - } // Weak reference to the interpreter. It won't wait on the callback // to finalize. - data->interp_id = PyInterpreterState_GetID(PyInterpreterState_Get()); - register_callback(async_callback, data); + PyInterpreterWeakRef *wref = PyInterpreterWeakRef_Get(); + register_callback(async_callback, wref); Py_RETURN_NONE; } From d9de49a3b1bd65b9a1fa4dae7511580f0a394772 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sat, 3 May 2025 11:39:47 -0400 Subject: [PATCH 07/31] Update the rejected ideas. --- peps/pep-0788.rst | 53 ++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 43 insertions(+), 10 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index d68f729bac0..022a667d0a2 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -454,6 +454,10 @@ Weak interpreter references The caller does not need to hold an :term:`attached thread state`, but is not safe to call in a re-entrant signal handler. + If the caller *does* hold an :term:`attached thread state`, and that thread + state holds a strong reference to the interpreter, then this function can + never fail. + .. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef *wref) Release a weak reference, possibly deallocating it. @@ -796,13 +800,30 @@ Reference Implementation ======================== A reference implementation of this PEP can be found -`here `_. +at `python/cpython#133110 `_. Rejected Ideas ============== -Using an interpreter ID instead of a interpreter state for ``PyThreadState_Ensure`` ------------------------------------------------------------------------------------ +Retrofiting the existing structures with reference counts +--------------------------------------------------------- + +Using interpreter state pointers for reference counting +******************************************************* + +Originally, this PEP specified :c:func:`!PyInterpreterState_Hold` +and :c:func:`!PyInterpreterState_Release` for managing strong references +to an interpreter, alongside :c:func:`!PyInterpreterState_Lookup` which +converted interpreter IDs (weak references) to strong references. + +In the end, this was rejected, primarily because it was needlessly +confusing. Interpreter states hadn't ever had a reference count prior, so +there was a lack of intuition about when and where something was a strong +reference. The ``PyInterpreterRef`` and ``PyInterpreterWeakRef`` seem a lot +clearer. + +Using interpreter IDs for reference counting +******************************************** Some iterations of this API took an ``int64_t interp_id`` parameter instead of ``PyInterpreterState *interp``, because interpreter IDs cannot be concurrently @@ -813,10 +834,7 @@ requiring less magic in the implementation, but has several downsides: - Nearly all existing interpreter APIs already return a :c:type:`PyInterpreterState` pointer, not an interpreter ID. Functions like :c:func:`PyThreadState_GetInterpreter` would have to be accompanied by - frustrating calls to :c:func:`PyInterpreterState_GetID`. There's also - no existing way to go from an ``int64_t`` back to a - :c:expr:`PyInterpreterState *`, and providing such an API would come - with its own set of design problems. + frustrating calls to :c:func:`PyInterpreterState_GetID`. - Threads typically take a ``void *arg`` parameter, not an ``int64_t arg``. As such, passing an interpreter pointer requires much less boilerplate for the user, because an additional structure definition or heap allocation @@ -829,9 +847,7 @@ requiring less magic in the implementation, but has several downsides: must be tracked elsewhere in the interpreter, likely being *more* complex than :c:func:`PyInterpreterState_Hold`. There's also a lack of intuition that a standalone integer could have such a thing as - a reference count. :c:func:`PyInterpreterState_Lookup` sidesteps this - problem because the reference count is always associated with the returned - interpreter state, not the integer ID. + a reference count. .. _pep-788-activate-deactivate-instead: @@ -893,6 +909,23 @@ In addition, it's unclear whether to remove them at all. A functions if it's determined that a full ``PyGILState`` removal would be too disruptive for the ecosystem. +Should ``PyThreadState_Ensure`` steal a reference? +-------------------------------------------------- + +At the moment, :c:func:`PyThreadState_Ensure` steals a reference to the +interpreter. This is controversial, because it's not necessarily the right +default. + +For now, it's staing because in cases where a reference is supposed +to be multi-use, :c:func:`PyInterpreterRef_Dup` can be used to make up +for the stolen reference. If it didn't still a reference, there's no +opposite helper function to throw away the reference, so it's just more +boilerplate. But, this is based on the assumption that there is a general +desire for single-use interpreter references. If this doesn't prove to be +the case, and a multi-use reference is overwhelmingly more common, then it +seems reasonable to let :c:func:`PyThreadState_Ensure` form its own reference +from the one passed to it. + Copyright ========= From c742d933a8b84bdea8992c825dd9d4b05edbda16 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sat, 3 May 2025 11:44:43 -0400 Subject: [PATCH 08/31] Fix some outdated references. --- peps/pep-0788.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 022a667d0a2..6791b3e3aae 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -827,16 +827,16 @@ Using interpreter IDs for reference counting Some iterations of this API took an ``int64_t interp_id`` parameter instead of ``PyInterpreterState *interp``, because interpreter IDs cannot be concurrently -deleted and cause use-after-free violations. :c:func:`PyInterpreterState_Hold` -fixes this issue anyway, but an interpreter ID does have the benefit of -requiring less magic in the implementation, but has several downsides: +deleted and cause use-after-free violations. The reference counting APIs in +this PEP sidestep this issue anyway, but an interpreter ID have the advantage +of requiring less magic: - Nearly all existing interpreter APIs already return a :c:type:`PyInterpreterState` pointer, not an interpreter ID. Functions like :c:func:`PyThreadState_GetInterpreter` would have to be accompanied by frustrating calls to :c:func:`PyInterpreterState_GetID`. - Threads typically take a ``void *arg`` parameter, not an ``int64_t arg``. - As such, passing an interpreter pointer requires much less boilerplate + As such, passing a reference requires much less boilerplate for the user, because an additional structure definition or heap allocation would be needed to store the interpreter ID. This is especially an issue on 32-bit systems, where ``void *`` is too small for an ``int64_t``. @@ -845,7 +845,7 @@ requiring less magic in the implementation, but has several downsides: the native thread gets a chance to attach. The problem with using an interpreter ID is that the reference count has to be "invisible"; it must be tracked elsewhere in the interpreter, likely being *more* - complex than :c:func:`PyInterpreterState_Hold`. There's also a lack + complex than :c:func:`PyInterpreterRef_Get`. There's also a lack of intuition that a standalone integer could have such a thing as a reference count. From ad1bf7f1acf4694f8003cca653574ed9b598b57b Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sun, 4 May 2025 09:06:20 -0400 Subject: [PATCH 09/31] Fix typo in rejected ideas. --- peps/pep-0788.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 6791b3e3aae..69632511936 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -916,7 +916,7 @@ At the moment, :c:func:`PyThreadState_Ensure` steals a reference to the interpreter. This is controversial, because it's not necessarily the right default. -For now, it's staing because in cases where a reference is supposed +For now, it's staying, because in cases where a reference is supposed to be multi-use, :c:func:`PyInterpreterRef_Dup` can be used to make up for the stolen reference. If it didn't still a reference, there's no opposite helper function to throw away the reference, so it's just more From bca61313e14b0bbe6d999d5037974d00540e8051 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sun, 4 May 2025 09:07:20 -0400 Subject: [PATCH 10/31] Adjust threading section. --- peps/pep-0788.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 69632511936..3c7ed8e6003 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -551,8 +551,8 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`. call :c:func:`PyThreadState_SetDaemon` before :c:func:`PyThreadState_Ensure` at your own discretion. -Changes to :mod:`threading` shutdown ------------------------------------- +Changes to ``threading`` shutdown and behavior +---------------------------------------------- An interpreter currently special-cases non-daemon threads created by :mod:`threading` and joins them before the interpreter does any other @@ -563,8 +563,8 @@ will rely on the interpreter's strong reference to run until completion. :mod:`threading`-created threads will still be joined to release resources after this has happened. -Additionally, setting a :class:`threading.Thread` to :attr:`~threading.Thread.daemon` -should correspond to calling :c:func:`PyThreadState_SetDaemon` in C. Otherwise, +Additionally, setting :attr:`threading.Thread.daemon` should +correspond to calling :c:func:`PyThreadState_SetDaemon` in C. Otherwise, :c:func:`PyThreadState_GetDaemon` will have incorrect results in Python threads. From 868cdefc96ac282890a62cde2743fef84e8c8a5a Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sun, 4 May 2025 09:09:13 -0400 Subject: [PATCH 11/31] Specify that PyInterpreterRef is pointer-sized --- peps/pep-0788.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 3c7ed8e6003..b3e48018641 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -387,6 +387,8 @@ Strong interpreter references The interpreter will wait until a strong reference has been released before shutting down. + This type is guaranteed to be pointer-sized. + .. c:function:: PyInterpreterRef PyInterpreterRef_Get(void) Acquire a strong reference to the current interpreter. From 6b3a447820adf8f17743e5341bf5fd65bba136c6 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sun, 4 May 2025 09:09:52 -0400 Subject: [PATCH 12/31] Add clarity to reference counting. --- peps/pep-0788.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index b3e48018641..e0bb7e4d8b9 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -362,8 +362,8 @@ Specification Interpreter reference counting ------------------------------ -An interpreter will keep track of the number of non-daemon threads through -a reference count. During finalization, the interpreter will wait until its +An interpreter will keep track of a reference count managed by threads. +During finalization, the interpreter will wait until its reference count reaches zero, and once that happens, threads can no longer acquire a strong reference to the interpreter. Threads can hold as many references as they want, but in most cases, a thread will have one reference From f5e1af804b65697517711acdad8802b79f91b68a Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sun, 4 May 2025 09:13:37 -0400 Subject: [PATCH 13/31] Fix typo in example. --- peps/pep-0788.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index e0bb7e4d8b9..dc378a60fbd 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -692,7 +692,7 @@ This is the same code, rewritten to use the new functions: static int thread_func(void *arg) { - PyInterpreterRefinterp = (PyInterpreterRef)arg; + PyInterpreterRef interp = (PyInterpreterRef)arg; if (PyThreadState_Ensure(interp) < 0) { return -1; } From 98e7fcc9bd42724d55bf2bf93320bee134351a01 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sun, 4 May 2025 09:34:34 -0400 Subject: [PATCH 14/31] Formalize the headings. --- peps/pep-0788.rst | 106 +++++++++++++++++++++++----------------------- 1 file changed, 54 insertions(+), 52 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index dc378a60fbd..59900e54405 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -1,5 +1,5 @@ PEP: 788 -Title: Reimagining native threads +Title: Reimagining Native Threads Author: Peter Bierma Sponsor: Victor Stinner Discussions-To: https://discuss.python.org/t/89863 @@ -60,8 +60,8 @@ be acquired and attached in a thread-safe and predictable manner. Motivation ========== -Native threads will always hang during finalization ---------------------------------------------------- +Native Threads Always Hang During Finalization +---------------------------------------------- Many codebases might need to call Python code in highly-asynchronous situations where the desired interpreter @@ -109,8 +109,8 @@ is severely limiting for users who want to do more than just execute Python code in their stream of calls (for example, C++ executing finalizers in *addition* to calling Python). -Using ``Py_IsFinalizing`` is insufficient -***************************************** +``Py_IsFinalizing`` is Insufficient +*********************************** The :ref:`docs ` currently recommend :c:func:`Py_IsFinalizing` to guard against termination of @@ -128,8 +128,8 @@ issues; the interpreter might not be finalizing during the call to afterwards, which would cause the attachment of a thread state (typically via :c:func:`PyGILState_Ensure`) to hang the thread. -Daemon threads can cause finalization deadlocks -*********************************************** +Daemon Threads Can Deadlock Finalization +**************************************** When acquiring locks, it's extremely important to detach the thread state to prevent deadlocks. This is true on both the with-GIL and free-threaded builds. @@ -168,8 +168,8 @@ finalization, because a daemon thread got hung while holding the lock. .. _pep-788-hanging-compat: -We can't change finalization behavior for ``PyGILState_Ensure`` -*************************************************************** +Finalization Behavior for ``PyGILState_Ensure`` Cannot Change +************************************************************* There will always have to be a point in a Python program where :c:func:`PyGILState_Ensure` can no longer attach a thread state. @@ -189,15 +189,15 @@ the thread or emit a fatal error, as noted in For this reason, we can't make any real changes to how :c:func:`PyGILState_Ensure` works during finalization, because it would break existing code. -The existing APIs are broken and misleading -------------------------------------------- +The GIL-state APIs are Buggy and Confusing +------------------------------------------ There are currently two public ways for a user to create and attach their own :term:`thread state`; manual use of :c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap`, and :c:func:`PyGILState_Ensure`. The latter, :c:func:`PyGILState_Ensure`, is `significantly more common `_. -``PyGILState_Ensure`` generally crashes during finalization +``PyGILState_Ensure`` Generally Crashes During Finalization *********************************************************** At the time of writing, the current behavior of :c:func:`PyGILState_Ensure` does not @@ -208,7 +208,7 @@ that could be fixed in CPython, but it's definitely worth noting here. Incidentally, acceptance and implementation of this PEP will likely fix the existing crashes caused by :c:func:`PyGILState_Ensure`. -The term "GIL" is tricky for free-threading +The Term "GIL" is Tricky for Free-threading ******************************************* A large issue with the term "GIL" in the C API is that it is semantically @@ -246,8 +246,8 @@ roughly equivalent to the following: .. _pep-788-subinterpreters-gilstate: -Subinterpreters don't work with ``PyGILState_Ensure`` ------------------------------------------------------ +``PyGILState_Ensure`` Doesn't Guess the Correct Interpreter +----------------------------------------------------------- As noted in the :ref:`documentation `, ``PyGILState`` APIs aren't officially supported in subinterpreters: @@ -273,8 +273,8 @@ any synchronization between the two GILs, so both the thread (who thinks it's in the subinterpreter) and the main thread could try to increment the reference count at the same time, causing a data race! -Interpreters can concurrently shut down -*************************************** +Concurrent Interpreter Deallocation +*********************************** The other way of creating a native thread that can invoke Python, :c:func:`PyThreadState_New` / :c:func:`PyThreadState_Swap`, is a lot better @@ -302,8 +302,8 @@ As a summary, there's a few bases we want to cover in a new API: - Backwards-compatibility with the old APIs and ideas, such as "daemonness" (but as opt-in). -Preventing interpreter finalization with references ---------------------------------------------------- +Preventing Interpreter Finalization with Reference Counting +----------------------------------------------------------- This PEP takes an approach where interpreters are given a reference count by non-daemon threads that want to (or do) hold an attached thread state. When @@ -335,8 +335,8 @@ program exits, it's still up to the user on how to join the thread, for example by using an :mod:`atexit` handler can be used to join the thread. Again, this PEP isn't trying to reinvent how to create or join threads! -Replacing the old APIs ----------------------- +Removing the GIL-state APIs +--------------------------- Due to the plethora of issues with ``PyGILState``, this PEP intends to do away with them entirely. In today's C API, all ``PyGILState`` functions are @@ -359,15 +359,17 @@ aren't too clear, see :ref:`pep-788-deprecation`. Specification ============= -Interpreter reference counting ------------------------------- +Interpreter Reference Counting to Prevent Shutdown +-------------------------------------------------- An interpreter will keep track of a reference count managed by threads. During finalization, the interpreter will wait until its reference count reaches zero, and once that happens, threads can no longer -acquire a strong reference to the interpreter. Threads can hold as many -references as they want, but in most cases, a thread will have one reference -at a time, typically through the :term:`attached thread state`. +acquire a strong reference to the interpreter. The interpreter +must not hang threads until this reference count has reached zero. +Threads can hold as many references as they want, but in most cases, +a thread will have one reference at a time, typically through the +:term:`attached thread state`. An attached thread state is made non-daemon by holding a strong reference to the interpreter. When a non-daemon thread state is destroyed, it releases @@ -378,7 +380,7 @@ be safely accessed after the interpreter no longer supports strong references, and even after the interpreter has been deleted. But, at that point, the weak reference can no longer be converted to a strong reference. -Strong interpreter references +Strong Interpreter References ***************************** .. c:type:: PyInterpreterRef @@ -417,7 +419,7 @@ Strong interpreter references This function cannot fail, and the caller doesn't need to hold an :term:`attached thread state`. -Weak interpreter references +Weak Interpreter References *************************** .. c:type:: PyInterpreterWeakRef @@ -467,8 +469,8 @@ Weak interpreter references This function cannot fail, and the caller doesn't need to hold an :term:`attached thread state`. -Daemon and non-daemon threads ------------------------------ +Daemon and Non-daemon Thread States +----------------------------------- A non-daemon thread state is a thread state that holds a strong reference to an interpreter. The reference is released when the thread state is deleted, either @@ -504,7 +506,7 @@ See :ref:`pep-788-hanging-compat`. This function cannot fail, other than with a fatal error if the caller has no :term:`attached thread state`. -Ensuring and releasing thread states +Ensuring and Releasing Thread States ------------------------------------ This proposal includes two new high-level threading APIs that intend to @@ -553,8 +555,8 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`. call :c:func:`PyThreadState_SetDaemon` before :c:func:`PyThreadState_Ensure` at your own discretion. -Changes to ``threading`` shutdown and behavior ----------------------------------------------- +``threading`` Shutdown and Behavior +----------------------------------- An interpreter currently special-cases non-daemon threads created by :mod:`threading` and joins them before the interpreter does any other @@ -570,8 +572,8 @@ correspond to calling :c:func:`PyThreadState_SetDaemon` in C. Otherwise, :c:func:`PyThreadState_GetDaemon` will have incorrect results in Python threads. -Deprecation of ``PyGILState`` APIs ----------------------------------- +Deprecation of GIL-state APIs +----------------------------- This PEP deprecates all of the existing ``PyGILState`` APIs in favor of the existing and new ``PyThreadState`` APIs. Namely: @@ -612,8 +614,8 @@ Examples These examples are here to help understand the APIs described in this PEP. Ideally, they could be reused in the documentation. -Single-threaded example -*********************** +Example: A Single-threaded Ensure +********************************* This example shows acquiring a lock in a Python method. @@ -648,8 +650,8 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked! Py_RETURN_NONE; } -Transitioning from the existing functions -***************************************** +Example: Transitioning From the Legacy Functions +************************************************ The following code uses the ``PyGILState`` APIs: @@ -721,8 +723,8 @@ This is the same code, rewritten to use the new functions: } -Daemon thread example -********************* +Example: A Daemon Thread +************************ Native daemon threads are still a use-case, and as such, they can still be used with this API: @@ -758,8 +760,8 @@ they can still be used with this API: Py_RETURN_NONE; } -Asynchronous callback example -***************************** +Example: An Asynchronous Callback +********************************* In some cases, the thread might not ever start, such as in a callback. We can't use a strong reference here, because a strong reference would @@ -807,11 +809,11 @@ at `python/cpython#133110 `_. Rejected Ideas ============== -Retrofiting the existing structures with reference counts +Retrofiting the Existing Structures with Reference Counts --------------------------------------------------------- -Using interpreter state pointers for reference counting -******************************************************* +Interpreter-State Pointers for Reference Counting +************************************************* Originally, this PEP specified :c:func:`!PyInterpreterState_Hold` and :c:func:`!PyInterpreterState_Release` for managing strong references @@ -824,8 +826,8 @@ there was a lack of intuition about when and where something was a strong reference. The ``PyInterpreterRef`` and ``PyInterpreterWeakRef`` seem a lot clearer. -Using interpreter IDs for reference counting -******************************************** +Interpreter IDs for Reference Counting +************************************** Some iterations of this API took an ``int64_t interp_id`` parameter instead of ``PyInterpreterState *interp``, because interpreter IDs cannot be concurrently @@ -874,7 +876,7 @@ This was ultimately rejected for two reasons: for code-generators like Cython to use, as there isn't any additional complexity with tracking :c:type:`PyThreadState` pointers around. -Using ``PyStatus`` for the return value of ``PyThreadState_Ensure`` +Using ``PyStatus`` for the Return Value of ``PyThreadState_Ensure`` ------------------------------------------------------------------- In prior iterations of this API, :c:func:`PyThreadState_Ensure` returned a @@ -897,8 +899,8 @@ Open Issues .. _pep-788-deprecation: -When should the legacy APIs be removed? ---------------------------------------- +When Should the GIL-state APIs be Removed? +------------------------------------------ :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release` have been around for over two decades, and it's expected that the migration will be difficult. @@ -911,7 +913,7 @@ In addition, it's unclear whether to remove them at all. A functions if it's determined that a full ``PyGILState`` removal would be too disruptive for the ecosystem. -Should ``PyThreadState_Ensure`` steal a reference? +Should ``PyThreadState_Ensure`` Steal a Reference? -------------------------------------------------- At the moment, :c:func:`PyThreadState_Ensure` steals a reference to the From 95916a72d76931a981ba341a7b315f7c5e40cc05 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sun, 4 May 2025 09:49:28 -0400 Subject: [PATCH 15/31] Add a terminology section. --- peps/pep-0788.rst | 50 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 3 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 59900e54405..52183fad231 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -57,6 +57,50 @@ and :c:func:`PyThreadState_Ensure` as replacements for the existing functions, accompanied by some interpreter reference counting APIs that let thread states be acquired and attached in a thread-safe and predictable manner. +Terminology +=========== + +Interpreters +------------ + +In this proposal, "interpreter" refers to a singular, isolated interpreter +(see :pep:`684`), with its own :c:type:`PyInterpreterState` pointer (referred +to as an "interpreter-state"). Interpreter *does not* refer to the entirety +of a Python process. + +The "current interpreter" refers to the interpreter by the interpreter-state +pointer on an :term:`attached thread state`. + +Finalization vs Shutdown +------------------------ + +Throughout this PEP, the terms "finalization" and "shutdown" are used in +reference to what an interpreter does at the end of its lifetime, either +because the program is closing or because :c:func:`Py_EndInterpreter` was +called. There's a subtle difference between the two terms, as used in this +PEP: + +- "Finalization" refers to an interpreter getting ready to "shut down", in + which it runs garbage collections, cleans up threads, and deletes + per-interpreter state. This should not be confused with *runtime* + finalization, where process-wide state is also cleaned up, but be aware + that the main interpreter is finalized alongside the runtime. +- "Shutdown" (or "shut down", as a verb) refers to the interpreter being + finished, after finalization has already happened. For example, shutdown + for a subinterpreter entails the interpreter's state structure being + deallocated. + +Native and Python Threads +------------------------- + +This PEP refers to a thread created using the C API as a "native thread", +also sometimes referred to as a "non-Python created thread", where a "Python +created" is a thread created by the :mod:`threading` module. + +Native threads are typically created by :c:func:`PyGILState_Ensure`, but more +technically, it refers to any thread with a :term:`thread state` created using +the C API. + Motivation ========== @@ -274,7 +318,7 @@ in the subinterpreter) and the main thread could try to increment the reference count at the same time, causing a data race! Concurrent Interpreter Deallocation -*********************************** +----------------------------------- The other way of creating a native thread that can invoke Python, :c:func:`PyThreadState_New` / :c:func:`PyThreadState_Swap`, is a lot better @@ -302,8 +346,8 @@ As a summary, there's a few bases we want to cover in a new API: - Backwards-compatibility with the old APIs and ideas, such as "daemonness" (but as opt-in). -Preventing Interpreter Finalization with Reference Counting ------------------------------------------------------------ +Preventing Interpreter Shutdown with Reference Counting +------------------------------------------------------- This PEP takes an approach where interpreters are given a reference count by non-daemon threads that want to (or do) hold an attached thread state. When From 257a25250a07ac80cc2f73b6457e8266a77f2ff0 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sun, 4 May 2025 15:03:14 -0400 Subject: [PATCH 16/31] Add PyInterpreterState_AsStrong() --- peps/pep-0788.rst | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 52183fad231..01809938fa1 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -317,11 +317,11 @@ any synchronization between the two GILs, so both the thread (who thinks it's in the subinterpreter) and the main thread could try to increment the reference count at the same time, causing a data race! -Concurrent Interpreter Deallocation ------------------------------------ +Concurrent Interpreter Deallocation Issues +------------------------------------------ The other way of creating a native thread that can invoke Python, -:c:func:`PyThreadState_New` / :c:func:`PyThreadState_Swap`, is a lot better +:c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap`, is a lot better for supporting subinterpreters (because :c:func:`PyThreadState_New` takes an explicit interpreter, rather than assuming that the main interpreter was requested), but is still limited by the current hanging problems in the C API. @@ -445,6 +445,17 @@ Strong Interpreter References This function cannot fail, other than with a fatal error when the caller doesn't hold an :term:`attached thread state`. +.. c:function:: PyInterpreterRef PyInterpreterState_AsStrong(PyInterpreterState *interp) + + Acquire a strong reference to *interp*. + + Beware: this function can cause crashes if *interp* shuts down in + another thread! Prefer safely acquiring a reference through + :c:func:`PyInterpreterRef_Get` where possible. + + This function will return ``0`` if *interp* has already finished waiting on + non-daemon threads. + .. c:function:: PyInterpreterRef PyInterpreterRef_Dup(PyInterpreterRef ref) Duplicate a strong reference to an interpreter. From 6b9b74e3141fccaa3db3fcc12ce4789d3af42dfd Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sun, 4 May 2025 15:13:12 -0400 Subject: [PATCH 17/31] Add an example for PyInterpreterState_AsStrong() --- peps/pep-0788.rst | 44 ++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 40 insertions(+), 4 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 01809938fa1..c5c32717d60 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -449,12 +449,13 @@ Strong Interpreter References Acquire a strong reference to *interp*. - Beware: this function can cause crashes if *interp* shuts down in - another thread! Prefer safely acquiring a reference through - :c:func:`PyInterpreterRef_Get` where possible. + Unless *interp* is the main interpreter, this function can cause crashes + if *interp* shuts down in another thread! Prefer safely acquiring a + reference through :c:func:`PyInterpreterRef_Get` where possible. This function will return ``0`` if *interp* has already finished waiting on - non-daemon threads. + non-daemon threads. The caller does not need to hold an + :term:`attached thread state`. .. c:function:: PyInterpreterRef PyInterpreterRef_Dup(PyInterpreterRef ref) @@ -855,6 +856,41 @@ deadlock the interpreter if it's not released. Py_RETURN_NONE; } +Example: Calling Python Without a Closure +***************************************** + +There are a few cases where callback functions don't take a closure +(``void *arg``), so it's impossible to acquire a reference to any specific +interpreter. The solution to this problem is to acquire a reference to the main +interpreter through :c:func:`PyInterpreterState_AsStrong`. + +But wait, won't that break with subinterpreters, per +:ref:`pep-788-subinterpreters-gilstate`? Fortunately, since the callback has +no closure, it's not possible for the caller to pass any objects or +interpreter-specific data, so it's completely safe to choose the main +interpreter here. + +.. code-block:: c + + static void + call_python(void) + { + PyInterpreterRef *ref = PyInterpreterState_AsStrong(PyInterpreterState_Main()); + if (ref == 0) { + fputs(stderr, "Python has shut down!"); + return; + } + + if (PyThreadState_Ensure(ref) < 0) { + return -1; + } + if (PyRun_SimpleString("print(42)") < 0) { + PyErr_Print(); + } + PyThreadState_Release(); + return 0; + } + Reference Implementation ======================== From 48624efb3c5d9a9d24c836e36edd24e44636ffed Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sun, 4 May 2025 15:48:41 -0400 Subject: [PATCH 18/31] An editorial pass. --- peps/pep-0788.rst | 150 ++++++++++++++++++++++++---------------------- 1 file changed, 78 insertions(+), 72 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index c5c32717d60..09cb7d30ed4 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -29,9 +29,9 @@ an interpreter: - :c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap` (significantly less common). -The former, ``PyGILState``, are the most common way to do this and have been -the standard for over twenty years (:pep:`311`), but have a number of issues -that have arisen over time: +The former, :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`, +are the most common way to do this and have been the standard for over twenty +years (:pep:`311`), but have a number of issues that have arisen over time: - Subinterpreters tend to have trouble with them, because in threads that haven't ever had an attached thread state, :c:func:`PyGILState_Ensure` @@ -55,7 +55,7 @@ Python. This PEP intends to solve these issues by providing :c:func:`PyThreadState_Ensure` and :c:func:`PyThreadState_Ensure` as replacements for the existing functions, accompanied by some interpreter reference counting APIs that let thread states -be acquired and attached in a thread-safe and predictable manner. +be acquired and attached in a thread-safe, and predictable manner. Terminology =========== @@ -65,11 +65,12 @@ Interpreters In this proposal, "interpreter" refers to a singular, isolated interpreter (see :pep:`684`), with its own :c:type:`PyInterpreterState` pointer (referred -to as an "interpreter-state"). Interpreter *does not* refer to the entirety +to as an "interpreter-state"). "Interpreter" *does not* refer to the entirety of a Python process. -The "current interpreter" refers to the interpreter by the interpreter-state -pointer on an :term:`attached thread state`. +The "current interpreter" refers to the interpreter-state +pointer on an :term:`attached thread state`, as returned by +:c:func:`PyThreadState_GetInterpreter`. Finalization vs Shutdown ------------------------ @@ -81,14 +82,16 @@ called. There's a subtle difference between the two terms, as used in this PEP: - "Finalization" refers to an interpreter getting ready to "shut down", in - which it runs garbage collections, cleans up threads, and deletes + which it runs its final garbage collections, cleans up + :term:`thread states `, and deletes per-interpreter state. This should not be confused with *runtime* finalization, where process-wide state is also cleaned up, but be aware that the main interpreter is finalized alongside the runtime. -- "Shutdown" (or "shut down", as a verb) refers to the interpreter being - finished, after finalization has already happened. For example, shutdown - for a subinterpreter entails the interpreter's state structure being - deallocated. +- "Shutdown" (or "shut down", as a verb) refers to the interpreter being in a + "finalized" state, after finalization has already happened. Shutdown + for a subinterpreter entails its interpreter-state structure being + deallocated, and shutdown for the main interpreter includes the entire Python + runtime being finalized. Native and Python Threads ------------------------- @@ -98,8 +101,8 @@ also sometimes referred to as a "non-Python created thread", where a "Python created" is a thread created by the :mod:`threading` module. Native threads are typically created by :c:func:`PyGILState_Ensure`, but more -technically, it refers to any thread with a :term:`thread state` created using -the C API. +technically, it refers to any thread with an :term:`attached thread state` +created and/or attached using the C API. Motivation ========== @@ -110,7 +113,7 @@ Native Threads Always Hang During Finalization Many codebases might need to call Python code in highly-asynchronous situations where the desired interpreter (:ref:`typically the main interpreter `) -could be finalizing or deleted, but want to continue running code after the +could be finalizing or deleted, but want to continue running code after invoking the interpreter. This desire has been `brought up by users `_. For example, a callback that wants to call Python code might be invoked when: @@ -139,19 +142,19 @@ Generally, this pattern would look something like this: In the current C API, any "native" thread (one not created via the :mod:`threading` module) is considered to be "daemon", meaning that the interpreter -won't wait on that thread to finalize. Instead, the interpreter will hang the +won't wait on that thread before shutting down. Instead, the interpreter will hang the thread when it goes to :term:`attach ` a :term:`thread state`, -making it unusable past that point. Attaching a thread state can happen at -any point when invoking Python, such as releasing it in-between bytecode -instructions (to yield the GIL), or when a C function exits a +making the thread unusable past that point. Attaching a thread state can happen at +any point when invoking Python, such as in-between bytecode instructions +(to yield the :term:`GIL` to a different thread), or when a C function exits a :c:macro:`Py_BEGIN_ALLOW_THREADS` block. (Note that hanging the thread is relatively new behavior; in prior versions, the thread would terminate, but the issue is the same.) -This means that any non-Python thread may be terminated at any point, which +This means that any non-Python/native thread may be terminated at any point, which is severely limiting for users who want to do more than just execute Python -code in their stream of calls (for example, C++ executing finalizers in -*addition* to calling Python). +code in their stream of calls (for example, C++ might want to execute other +finalizers in addition to calling Python). ``Py_IsFinalizing`` is Insufficient *********************************** @@ -169,8 +172,8 @@ the thread: Unfortunately, this isn't correct, because of time-of-call to time-of-use issues; the interpreter might not be finalizing during the call to :c:func:`Py_IsFinalizing`, but it might start finalizing immediately -afterwards, which would cause the attachment of a thread state (typically via -:c:func:`PyGILState_Ensure`) to hang the thread. +afterwards, which would cause the attachment of a thread state to hang the +thread. Daemon Threads Can Deadlock Finalization **************************************** @@ -185,8 +188,9 @@ lock. On free-threaded builds, lock-ordering deadlocks are still possible if thread A acquired the lock for object A and then object B, and then -another thread tried to acquire those locks in a reverse order. Free-threading -protects against this by releasing locks when the thread state is detached. +another thread tried to acquire those locks in the reverse order. Free-threading +currently protects against this by releasing locks when the thread state is +detached, making detachment a necessity to prevent deadlocks. So, all code that needs to work with locks need to detach the thread state. In C, this is almost always done via :c:macro:`Py_BEGIN_ALLOW_THREADS` and @@ -236,10 +240,10 @@ works during finalization, because it would break existing code. The GIL-state APIs are Buggy and Confusing ------------------------------------------ -There are currently two public ways for a user to create and attach their own -:term:`thread state`; manual use of :c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap`, -and :c:func:`PyGILState_Ensure`. The latter, :c:func:`PyGILState_Ensure`, -is `significantly more common `_. +There are currently two public ways for a user to create and attach a +:term:`thread state` for their thread; manual use of :c:func:`PyThreadState_New` +and :c:func:`PyThreadState_Swap`, and :c:func:`PyGILState_Ensure`. The latter, +:c:func:`PyGILState_Ensure`, is `the most common `_. ``PyGILState_Ensure`` Generally Crashes During Finalization *********************************************************** @@ -265,9 +269,8 @@ created by the authors of this PEP: omit ``PyGILState_Ensure`` in fresh threads. Again, :c:func:`PyGILState_Ensure` gets an :term:`attached thread state` -for the thread on both with-GIL and free-threaded builds. Acquisition of the -GIL on with-GIL builds is incidental! :c:func:`PyGILState_Ensure` is very -roughly equivalent to the following: +for the thread on both with-GIL and free-threaded builds. To demonstate, +:c:func:`PyGILState_Ensure` is very roughly equivalent to the following: .. code-block:: c @@ -288,13 +291,17 @@ roughly equivalent to the following: } } +An attached thread state is always needed to call the C API, so +:c:func:`PyGILState_Ensure` still needs to be called on free-threaded builds, +but with a name like "ensure GIL", it's not immediately clear that that's true. + .. _pep-788-subinterpreters-gilstate: ``PyGILState_Ensure`` Doesn't Guess the Correct Interpreter ----------------------------------------------------------- As noted in the :ref:`documentation `, -``PyGILState`` APIs aren't officially supported in subinterpreters: +the ``PyGILState`` functions aren't officially supported in subinterpreters: Note that the ``PyGILState_*`` functions assume there is only one global interpreter (created automatically by ``Py_Initialize()``). Python @@ -310,65 +317,61 @@ subinterpreters, because synchronization for the wrong interpreter will be used on objects shared between the threads. For example, if the thread had access to object A, which belongs to a -subinterpreter, but then called :c:func:`PyGILState_Ensure` would have an -attached thread state pointing to the main interpreter, not the subinterpreter. -This means that any GIL assumptions about the object are wrong! There isn't -any synchronization between the two GILs, so both the thread (who thinks it's -in the subinterpreter) and the main thread could try to increment the -reference count at the same time, causing a data race! +subinterpreter, but then called :c:func:`PyGILState_Ensure`, the thread would +have an :term:`attached thread state` pointing to the main interpreter, +not the subinterpreter. This means that any :term:`GIL` assumptions about the +object are wrong! There isn't any synchronization between the two GILs, so both +the thread (who thinks it's in the subinterpreter) and the main thread could try +to increment the reference count at the same time, causing a data race! Concurrent Interpreter Deallocation Issues ------------------------------------------ The other way of creating a native thread that can invoke Python, -:c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap`, is a lot better +:c:func:`PyThreadState_New` and :c:func:`PyThreadState_Swap`, is a lot better for supporting subinterpreters (because :c:func:`PyThreadState_New` takes an explicit interpreter, rather than assuming that the main interpreter was requested), but is still limited by the current hanging problems in the C API. In addition, subinterpreters typically have a much shorter lifetime than the main interpreter, so there's a much higher chance that an interpreter passed -to a thread will have already finished and have been deallocated. Passing that -interpreter to :c:func:`PyThreadState_New` will most likely crash the program. +to a thread will have already finished and have been deallocated. So, passing +that interpreter to :c:func:`PyThreadState_New` will most likely crash the program +because of a use-after-free on the interpreter-state. Rationale ========= So, how do we address all of this? The best way seems to be starting from -scratch and "reimagining" how to acquire and attach thread states in the C API. +scratch and "reimagining" how to create, acquire and attach +:term:`thread states ` in the C API. As a summary, there's a few bases we want to cover in a new API: - Require the caller to specify which interpreter they want to prevent those pesky problems with interpreter guessing. -- Prevent the thread from being arbitrarily bricked by calling into Python. +- But, we also need to cover cases where a closure isn't available, so the thread + won't have access to an interpreter state (but also won't have access to + any objects). +- Prevent the thread from being arbitrarily hung by calling into Python + during finalization. - Protection against deallocation on interpreters with short lifetimes. -- Backwards-compatibility with the old APIs and ideas, such as "daemonness" - (but as opt-in). +- Backwards-compatibility with the old APIs and ideas, such as daemonness. Preventing Interpreter Shutdown with Reference Counting ------------------------------------------------------- This PEP takes an approach where interpreters are given a reference count by -non-daemon threads that want to (or do) hold an attached thread state. When -the interpreter starts finalizing, it will until its reference count -reaches zero before proceeding to a point where threads will be hung. -Note that this *is not* the same as joining the thread; the interpreter will -only wait until the thread state has been released -(via :c:func:`PyThreadState_Release`) for all non-daemon threads. This isn't -the same as waiting for them to detach their thread state--it waits for them -to *destroy* it. Otherwise, this API wouldn't have any finalization benefits -over the existing ``PyThreadState`` functions. +non-daemon threads that want to (or do) hold an attached thread state. So, from a thread's perspective, holding a "strong reference" to the -interpreter will effectively prevent it from finalizing, making it safe to -invoke Python without worrying about the thread being hung. The strong -reference will be held as long as thread state is "alive", even if it's -detached. +interpreter will make it safe to invoke Python without worrying about +the thread being hung. A strong reference held by a thread state will +be held as long as thread state is "alive", even if it's detached. This proposal also comes with weak references to an interpreter that don't -prevent it from finalizing, but can be promoted to a strong reference once -decided that a thread state can attach. Promotion of a weak reference to a +prevent it from shutting down, but can be promoted to a strong reference when +the user decides that they want to call Python. Promotion of a weak reference to a strong reference can fail if the interpreter has already finalized, or reached a point during finalization where it can't be guaranteed that the thread won't hang. @@ -406,14 +409,17 @@ Specification Interpreter Reference Counting to Prevent Shutdown -------------------------------------------------- -An interpreter will keep track of a reference count managed by threads. -During finalization, the interpreter will wait until its -reference count reaches zero, and once that happens, threads can no longer -acquire a strong reference to the interpreter. The interpreter -must not hang threads until this reference count has reached zero. -Threads can hold as many references as they want, but in most cases, -a thread will have one reference at a time, typically through the -:term:`attached thread state`. +An interpreter will keep a reference count that's managed by threads. +When the interpreter starts finalizing, it will until its reference count +reaches zero before proceeding to a point where threads will be hung. +Note that this *is not* the same as joining the thread; the interpreter will +only wait until the reference count is zero, typically via releasing non-daemon +thread states with :c:func:`PyThreadState_Release`. The interpreter must not hang +threads until this reference count has reached zero. Threads can hold as many +references as they want, but in most cases, a thread will have one reference +at a time, typically through the :term:`attached thread state`. After the reference count +has reached zero, threads can no longer prevent the interpreter from shutting +down. An attached thread state is made non-daemon by holding a strong reference to the interpreter. When a non-daemon thread state is destroyed, it releases @@ -422,7 +428,7 @@ the reference. A weak reference to the interpreter won't prevent it from finalizing, but can be safely accessed after the interpreter no longer supports strong references, and even after the interpreter has been deleted. But, at that point, the weak -reference can no longer be converted to a strong reference. +reference can no longer be promoted to a strong reference. Strong Interpreter References ***************************** @@ -531,7 +537,7 @@ Daemon and Non-daemon Thread States A non-daemon thread state is a thread state that holds a strong reference to an interpreter. The reference is released when the thread state is deleted, either by :c:func:`PyThreadState_Release` or a different thread state deletion -function. +function (such as :c:func:`PyThreadState_Delete`). For backwards compatibility, all thread states created by existing APIs, including :c:func:`PyGILState_Ensure`, will remain daemon by default. From 31d3f750fa0af1c6ffa90976e48e378b00421511 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sun, 4 May 2025 16:02:00 -0400 Subject: [PATCH 19/31] Fix typo in example. --- peps/pep-0788.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 09cb7d30ed4..f56ce507ed7 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -881,7 +881,7 @@ interpreter here. static void call_python(void) { - PyInterpreterRef *ref = PyInterpreterState_AsStrong(PyInterpreterState_Main()); + PyInterpreterRef ref = PyInterpreterState_AsStrong(PyInterpreterState_Main()); if (ref == 0) { fputs(stderr, "Python has shut down!"); return; From 8440057f2a555d80fff99beeb73221ee6bf23cc6 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Mon, 5 May 2025 17:32:49 -0400 Subject: [PATCH 20/31] Some clarifications and a new example. --- peps/pep-0788.rst | 129 +++++++++++++++++++++++++++++++++------------- 1 file changed, 93 insertions(+), 36 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index f56ce507ed7..5212c2a2d4e 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -48,14 +48,13 @@ The other option, :c:func:`PyThreadState_New` and :c:func:`PyThreadState_Swap`, do solve those issues, but come with an additional problem with how thread state attachment works in the C API (that ``PyGILState`` also includes): if the thread is not the main thread, then the interpreter will randomly hang the -thread during attachment if it starts finalizing. This can be frustrating, -especially if there was some additional work to be done alongside invoking -Python. +thread during attachment if it starts finalizing. This is a problem for large +applications that want to use their thread in addition to calling Python. This PEP intends to solve these issues by providing :c:func:`PyThreadState_Ensure` and :c:func:`PyThreadState_Ensure` as replacements for the existing functions, accompanied by some interpreter reference counting APIs that let thread states -be acquired and attached in a thread-safe, and predictable manner. +be acquired and attached in a thread-safe and predictable manner. Terminology =========== @@ -110,7 +109,7 @@ Motivation Native Threads Always Hang During Finalization ---------------------------------------------- -Many codebases might need to call Python code in highly-asynchronous +Many large libraries might need to call Python code in highly-asynchronous situations where the desired interpreter (:ref:`typically the main interpreter `) could be finalizing or deleted, but want to continue running code after @@ -147,14 +146,33 @@ thread when it goes to :term:`attach ` a :term:`thread st making the thread unusable past that point. Attaching a thread state can happen at any point when invoking Python, such as in-between bytecode instructions (to yield the :term:`GIL` to a different thread), or when a C function exits a -:c:macro:`Py_BEGIN_ALLOW_THREADS` block. (Note that hanging the thread is -relatively new behavior; in prior versions, the thread would terminate, but -the issue is the same.) +:c:macro:`Py_BEGIN_ALLOW_THREADS` block, so simply guarding against whether the +interpreter is finalizing isn't enough to safely call Python code. (Note that hanging +the thread is relatively new behavior; in prior versions, the thread would terminate, +but the issue is the same.) This means that any non-Python/native thread may be terminated at any point, which is severely limiting for users who want to do more than just execute Python -code in their stream of calls (for example, C++ might want to execute other -finalizers in addition to calling Python). +code in their stream of calls. + +Joining the Thread isn't Always Possible +**************************************** + +In general, it's possible to prevent hanging of threads created while Python +is active through :mod:`atexit` functions. A thread could be started by some +C function, and then as long as that thread is joined by :mod:`atexit`, then +the thread won't hang. Reasonable enough, right? + +Unfortunately, :mod:`atexit` isn't always an option, because to call it, you +need to already have an :term:`attached thread state` for the thread. If +there's no guarantee of that, then :func:`atexit.register` cannot be safely +called without the risk of hanging the thread. + +For example, large C++ applications might want to expose an interface that can +call Python code. To do this, a function would take a Python object, and then +call :c:func:`PyGILState_Ensure` to safely interact with it (e.g., by calling +it). If the interpreter is finalizing or has shut down, then the thread is +hung, disrupting the C++ caller. ``Py_IsFinalizing`` is Insufficient *********************************** @@ -210,7 +228,8 @@ deadlocking. The main thread will continue to run finalizers past that point, though. If any of those finalizers try to acquire the lock, deadlock ensues. This affects CPython itself, and there's not much that can be done -to fix it. For example, `python/cpython#129536 `_ +to fix it with the current API. For example, +`python/cpython#129536 `_ remarks that the :mod:`ssl` module can emit a fatal error when used at finalization, because a daemon thread got hung while holding the lock. @@ -346,41 +365,37 @@ So, how do we address all of this? The best way seems to be starting from scratch and "reimagining" how to create, acquire and attach :term:`thread states ` in the C API. -As a summary, there's a few bases we want to cover in a new API: - -- Require the caller to specify which interpreter they want to prevent those - pesky problems with interpreter guessing. -- But, we also need to cover cases where a closure isn't available, so the thread - won't have access to an interpreter state (but also won't have access to - any objects). -- Prevent the thread from being arbitrarily hung by calling into Python - during finalization. -- Protection against deallocation on interpreters with short lifetimes. -- Backwards-compatibility with the old APIs and ideas, such as daemonness. - Preventing Interpreter Shutdown with Reference Counting ------------------------------------------------------- This PEP takes an approach where interpreters are given a reference count by -non-daemon threads that want to (or do) hold an attached thread state. +non-daemon threads that want to (or do) hold an :term:`attached thread state`. So, from a thread's perspective, holding a "strong reference" to the -interpreter will make it safe to invoke Python without worrying about +interpreter will make it safe to call the C API without worrying about the thread being hung. A strong reference held by a thread state will be held as long as thread state is "alive", even if it's detached. +This means that interfacing Python (for example, in a C++ library) will need +a reference to the interpreter in order to safely call the object, which is +definitely more inconvenient than assuming the main interpreter is the right +choice, but there's not really another option. + +Weak References +*************** + This proposal also comes with weak references to an interpreter that don't prevent it from shutting down, but can be promoted to a strong reference when -the user decides that they want to call Python. Promotion of a weak reference to a -strong reference can fail if the interpreter has already finalized, or reached -a point during finalization where it can't be guaranteed that the thread won't -hang. +the user decides that they want to call the C API. Promotion of a weak reference +to a strong reference can fail if the interpreter has already finalized, or +reached a point during finalization where it can't be guaranteed that the +thread won't hang. If there's additional work after destroying the thread state, the thread can continue running as normal. If that work needs to finish before the program exits, it's still up to the user on how to join the thread, for -example by using an :mod:`atexit` handler can be used to join the thread. -Again, this PEP isn't trying to reinvent how to create or join threads! +example by using an :mod:`atexit` handler can be used to join it. +This PEP isn't trying to reinvent how to create or join threads! Removing the GIL-state APIs --------------------------- @@ -406,8 +421,8 @@ aren't too clear, see :ref:`pep-788-deprecation`. Specification ============= -Interpreter Reference Counting to Prevent Shutdown --------------------------------------------------- +Interpreter References to Prevent Shutdown +------------------------------------------ An interpreter will keep a reference count that's managed by threads. When the interpreter starts finalizing, it will until its reference count @@ -515,7 +530,7 @@ Weak Interpreter References Return a strong reference to an interpreter from a weak reference. If the interpreter no longer exists or has already finished waiting for - non-daemon threads, then this function returns ``NULL``. + non-daemon threads, then this function returns ``0``. The caller does not need to hold an :term:`attached thread state`, but is not safe to call in a re-entrant signal handler. @@ -676,6 +691,48 @@ Examples These examples are here to help understand the APIs described in this PEP. Ideally, they could be reused in the documentation. +Example: A Library Interface +**************************** + +Imagine that you're developing a C library for logging. +You might want to provide an API that allows users to log to a Python file +object. + +With this PEP, you'd implement it like this: + +.. code-block:: c + + int + LogToPyFile(PyInterpreterWeakRef *wref, + PyObject *file, + const char *text) + { + PyInterpreterRef ref = PyInterpreterWeakRef_AsStrong(wref); + if (ref == 0) { + fputs("Python interpreter has shut down.", stderr); + return -1; + } + + if (PyThreadState_Ensure(ref) < 0) { + puts("Out of memory.", stderr); + return -1; + } + + char *to_write = do_some_text_mutation(text); + int res = PyFile_WriteString(to_write, file); + free(to_write); + PyErr_Print(); + PyThreadState_Release(); + return res < 0; + } + +If you were to use :c:func:`PyGILState_Ensure` for this case, then your +thread would hang if the interpreter were to be finalizing at that time! + +Additionally, the API supports subinterpreters. If one were to assume that +the main interpreter was active, then your library wouldn't be safe to use +with file objects created by a subinterpreter. + Example: A Single-threaded Ensure ********************************* @@ -837,7 +894,7 @@ deadlock the interpreter if it's not released. PyInterpreterWeakRef *wref = (PyInterpreterWeakRef *)arg; PyInterpreterRef *ref = PyInterpreterWeakRef_AsStrong(wref); if (ref == NULL) { - fputs(stderr, "Python has shut down!"); + fputs("Python has shut down!", stderr); return -1; } @@ -883,7 +940,7 @@ interpreter here. { PyInterpreterRef ref = PyInterpreterState_AsStrong(PyInterpreterState_Main()); if (ref == 0) { - fputs(stderr, "Python has shut down!"); + fputs("Python has shut down!", stderr); return; } From 9b08bf0d7e078c5e659fc0ea3b24b489ff8b86a4 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Mon, 5 May 2025 17:34:56 -0400 Subject: [PATCH 21/31] Fix wording. --- peps/pep-0788.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 5212c2a2d4e..c4ca56bb89d 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -729,8 +729,8 @@ With this PEP, you'd implement it like this: If you were to use :c:func:`PyGILState_Ensure` for this case, then your thread would hang if the interpreter were to be finalizing at that time! -Additionally, the API supports subinterpreters. If one were to assume that -the main interpreter was active, then your library wouldn't be safe to use +Additionally, the API supports subinterpreters. If you were to assume that +the main interpreter created the file object, then your library wouldn't be safe to use with file objects created by a subinterpreter. Example: A Single-threaded Ensure From 0e5acc8e0981c62ec99356f8109b40cd22655ebe Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Fri, 9 May 2025 07:21:29 -0400 Subject: [PATCH 22/31] Update peps/pep-0788.rst Co-authored-by: Victor Stinner --- peps/pep-0788.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 5212c2a2d4e..bb0c96cfcf8 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -472,7 +472,7 @@ Strong Interpreter References Unless *interp* is the main interpreter, this function can cause crashes if *interp* shuts down in another thread! Prefer safely acquiring a - reference through :c:func:`PyInterpreterRef_Get` where possible. + reference through :c:func:`PyInterpreterRef_Get` whenever possible. This function will return ``0`` if *interp* has already finished waiting on non-daemon threads. The caller does not need to hold an From 6d9664571c23f2e7fc4602dbbe942bfa2a148e0f Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Fri, 9 May 2025 07:25:05 -0400 Subject: [PATCH 23/31] Update peps/pep-0788.rst Co-authored-by: Victor Stinner --- peps/pep-0788.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index bb0c96cfcf8..c425ca01b1a 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -722,6 +722,7 @@ With this PEP, you'd implement it like this: int res = PyFile_WriteString(to_write, file); free(to_write); PyErr_Print(); + PyThreadState_Release(); return res < 0; } From a229f7b0eed0cbf29db5455b56b9c9c5493c0053 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Fri, 9 May 2025 07:25:35 -0400 Subject: [PATCH 24/31] Apply suggestions from code review Co-authored-by: Victor Stinner --- peps/pep-0788.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index c425ca01b1a..d57cfdbf971 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -709,12 +709,12 @@ With this PEP, you'd implement it like this: { PyInterpreterRef ref = PyInterpreterWeakRef_AsStrong(wref); if (ref == 0) { - fputs("Python interpreter has shut down.", stderr); + // Python interpreter has shut down return -1; } if (PyThreadState_Ensure(ref) < 0) { - puts("Out of memory.", stderr); + puts("Out of memory.\n", stderr); return -1; } @@ -895,7 +895,7 @@ deadlock the interpreter if it's not released. PyInterpreterWeakRef *wref = (PyInterpreterWeakRef *)arg; PyInterpreterRef *ref = PyInterpreterWeakRef_AsStrong(wref); if (ref == NULL) { - fputs("Python has shut down!", stderr); + fputs("Python has shut down!\n", stderr); return -1; } From 2332d3e8b946f6d298e9ef7c4aca688cbadb19e4 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Fri, 9 May 2025 07:27:33 -0400 Subject: [PATCH 25/31] Fix typos. --- peps/pep-0788.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index a3da579666e..7810c0b6643 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -52,7 +52,7 @@ thread during attachment if it starts finalizing. This is a problem for large applications that want to use their thread in addition to calling Python. This PEP intends to solve these issues by providing :c:func:`PyThreadState_Ensure` -and :c:func:`PyThreadState_Ensure` as replacements for the existing functions, +and :c:func:`PyThreadState_Release` as replacements for the existing functions, accompanied by some interpreter reference counting APIs that let thread states be acquired and attached in a thread-safe and predictable manner. @@ -893,8 +893,8 @@ deadlock the interpreter if it's not released. async_callback(void *arg) { PyInterpreterWeakRef *wref = (PyInterpreterWeakRef *)arg; - PyInterpreterRef *ref = PyInterpreterWeakRef_AsStrong(wref); - if (ref == NULL) { + PyInterpreterRef ref = PyInterpreterWeakRef_AsStrong(wref); + if (ref == 0) { fputs("Python has shut down!\n", stderr); return -1; } From d5630aff335e6edcafa5f17fed29e74d93274680 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Sat, 10 May 2025 10:28:39 -0400 Subject: [PATCH 26/31] Use non-pointers for PyInterpreterRef --- peps/pep-0788.rst | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 7810c0b6643..6932726dfd0 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -505,17 +505,20 @@ Weak Interpreter References The interpreter will *not* wait for the reference to be released before shutting down. -.. c:function:: PyInterpreterWeakRef *PyInterpreterWeakRef_Get(void) + This type is guaranteed to be pointer-sized. + +.. c:function:: PyInterpreterWeakRef PyInterpreterWeakRef_Get(void) Acquire a weak reference to the current interpreter. This function is generally meant to be used in tandem with :c:func:`PyInterpreterWeakRef_AsStrong`. - This function cannot fail, other than with a fatal error when the caller - doesn't hold an :term:`attached thread state`. + This function returns ``0`` without an exception set on failure. -.. c:function:: PyInterpreterWeakRef *PyInterpreterWeakRef_Dup(PyInterpreterWeakRef wref) + The caller must hold an :term:`attached thread state`. + +.. c:function:: PyInterpreterWeakRef PyInterpreterWeakRef_Dup(PyInterpreterWeakRef wref) Duplicate a weak reference to *wref*. @@ -525,7 +528,7 @@ Weak Interpreter References This function cannot fail, and the caller doesn't need to hold an :term:`attached thread state`. -.. c:function:: PyInterpreterRef PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef *wref) +.. c:function:: PyInterpreterRef PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef wref) Return a strong reference to an interpreter from a weak reference. @@ -539,7 +542,7 @@ Weak Interpreter References state holds a strong reference to the interpreter, then this function can never fail. -.. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef *wref) +.. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef wref) Release a weak reference, possibly deallocating it. @@ -703,7 +706,7 @@ With this PEP, you'd implement it like this: .. code-block:: c int - LogToPyFile(PyInterpreterWeakRef *wref, + LogToPyFile(PyInterpreterWeakRef wref, PyObject *file, const char *text) { @@ -892,7 +895,7 @@ deadlock the interpreter if it's not released. static int async_callback(void *arg) { - PyInterpreterWeakRef *wref = (PyInterpreterWeakRef *)arg; + PyInterpreterWeakRef wref = (PyInterpreterWeakRef)arg; PyInterpreterRef ref = PyInterpreterWeakRef_AsStrong(wref); if (ref == 0) { fputs("Python has shut down!\n", stderr); @@ -914,7 +917,7 @@ deadlock the interpreter if it's not released. { // Weak reference to the interpreter. It won't wait on the callback // to finalize. - PyInterpreterWeakRef *wref = PyInterpreterWeakRef_Get(); + PyInterpreterWeakRef wref = PyInterpreterWeakRef_Get(); register_callback(async_callback, wref); Py_RETURN_NONE; @@ -940,7 +943,7 @@ interpreter here. call_python(void) { PyInterpreterRef ref = PyInterpreterState_AsStrong(PyInterpreterState_Main()); - if (ref == 0) { + if (ref == NULL) { fputs("Python has shut down!", stderr); return; } @@ -978,8 +981,8 @@ converted interpreter IDs (weak references) to strong references. In the end, this was rejected, primarily because it was needlessly confusing. Interpreter states hadn't ever had a reference count prior, so there was a lack of intuition about when and where something was a strong -reference. The ``PyInterpreterRef`` and ``PyInterpreterWeakRef`` seem a lot -clearer. +reference. The :c:type:`PyInterpreterRef` and :c:type:`PyInterpreterWeakRef` +types seem a lot clearer. Interpreter IDs for Reference Counting ************************************** From 86b4b7985138098139c8d96beaacdb0670345bd7 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Mon, 12 May 2025 05:53:49 -0400 Subject: [PATCH 27/31] Change the API for PyInterpreterState_AsStrong() and PyInterpreterWeakRef_AsStrong() --- peps/pep-0788.rst | 41 +++++++++++++++++++++-------------------- 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 6932726dfd0..57926c85da3 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -466,7 +466,7 @@ Strong Interpreter References This function cannot fail, other than with a fatal error when the caller doesn't hold an :term:`attached thread state`. -.. c:function:: PyInterpreterRef PyInterpreterState_AsStrong(PyInterpreterState *interp) +.. c:function:: int PyInterpreterState_AsStrong(PyInterpreterState *interp, PyInterpreterRef *ref_ptr) Acquire a strong reference to *interp*. @@ -474,9 +474,12 @@ Strong Interpreter References if *interp* shuts down in another thread! Prefer safely acquiring a reference through :c:func:`PyInterpreterRef_Get` whenever possible. - This function will return ``0`` if *interp* has already finished waiting on - non-daemon threads. The caller does not need to hold an - :term:`attached thread state`. + On success, this function will return ``0`` and set *ref_ptr* to a strong + reference, and on failure, this function will return ``-1`` and set *ref_ptr* + to ``NULL``. (Failure typically indicates that *interp* has already finished + waiting on non-daemon threads). + + The caller does not need to hold an :term:`attached thread state`. .. c:function:: PyInterpreterRef PyInterpreterRef_Dup(PyInterpreterRef ref) @@ -512,9 +515,7 @@ Weak Interpreter References Acquire a weak reference to the current interpreter. This function is generally meant to be used in tandem with - :c:func:`PyInterpreterWeakRef_AsStrong`. - - This function returns ``0`` without an exception set on failure. + :c:func:`PyInterpreterWeakRef_AsStrong`, and cannot fail. The caller must hold an :term:`attached thread state`. @@ -528,20 +529,20 @@ Weak Interpreter References This function cannot fail, and the caller doesn't need to hold an :term:`attached thread state`. -.. c:function:: PyInterpreterRef PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef wref) +.. c:function:: int PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef wref, PyInterpreterRef *ref_ptr) + + Acquire a strong reference to an interpreter through a weak reference. - Return a strong reference to an interpreter from a weak reference. + On success, this function returns ``0`` and sets *ref_ptr* to a strong + reference to the interpreter denoted by *wref*. If the interpreter no longer exists or has already finished waiting for - non-daemon threads, then this function returns ``0``. + non-daemon threads, then this function returns ``-1`` and sets *ref_ptr* + to ``NULL``. The caller does not need to hold an :term:`attached thread state`, but is not safe to call in a re-entrant signal handler. - If the caller *does* hold an :term:`attached thread state`, and that thread - state holds a strong reference to the interpreter, then this function can - never fail. - .. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef wref) Release a weak reference, possibly deallocating it. @@ -710,8 +711,8 @@ With this PEP, you'd implement it like this: PyObject *file, const char *text) { - PyInterpreterRef ref = PyInterpreterWeakRef_AsStrong(wref); - if (ref == 0) { + PyInterpreterRef ref; + if (PyInterpreterWeakRef_AsStrong(wref, &ref) < 0) { // Python interpreter has shut down return -1; } @@ -896,8 +897,8 @@ deadlock the interpreter if it's not released. async_callback(void *arg) { PyInterpreterWeakRef wref = (PyInterpreterWeakRef)arg; - PyInterpreterRef ref = PyInterpreterWeakRef_AsStrong(wref); - if (ref == 0) { + PyInterpreterRef ref; + if (PyInterpreterWeakRef_AsStrong(wref, &ref) < 0) { fputs("Python has shut down!\n", stderr); return -1; } @@ -942,8 +943,8 @@ interpreter here. static void call_python(void) { - PyInterpreterRef ref = PyInterpreterState_AsStrong(PyInterpreterState_Main()); - if (ref == NULL) { + PyInterpreterRef ref; + if (PyInterpreterState_AsStrong(PyInterpreterState_Main(), &ref) < 0) { fputs("Python has shut down!", stderr); return; } From 3212a611e789d321e681805069e14238ef91fcb0 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Mon, 12 May 2025 09:34:14 -0400 Subject: [PATCH 28/31] Don't specify setting `NULL` --- peps/pep-0788.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 57926c85da3..341ad87f35d 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -475,8 +475,8 @@ Strong Interpreter References reference through :c:func:`PyInterpreterRef_Get` whenever possible. On success, this function will return ``0`` and set *ref_ptr* to a strong - reference, and on failure, this function will return ``-1`` and set *ref_ptr* - to ``NULL``. (Failure typically indicates that *interp* has already finished + reference, and on failure, this function will return ``-1``. + (Failure typically indicates that *interp* has already finished waiting on non-daemon threads). The caller does not need to hold an :term:`attached thread state`. @@ -537,11 +537,11 @@ Weak Interpreter References reference to the interpreter denoted by *wref*. If the interpreter no longer exists or has already finished waiting for - non-daemon threads, then this function returns ``-1`` and sets *ref_ptr* - to ``NULL``. + non-daemon threads, then this function returns ``-1``. - The caller does not need to hold an :term:`attached thread state`, but is - not safe to call in a re-entrant signal handler. + This function is not safe to call in a re-entrant signal handler. + + The caller does not need to hold an :term:`attached thread state`. .. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef wref) From 6e3550cf708e1c202b32daa25c5eba17a87cf594 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Mon, 12 May 2025 20:38:46 -0400 Subject: [PATCH 29/31] infinitely -> unbounded --- peps/pep-0788.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 341ad87f35d..726d79afc7f 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -18,7 +18,7 @@ In Python, threads are able to interact with an interpreter (e.g., invoke the bytecode loop) through an :term:`attached thread state`. On with-GIL builds, only one thread can hold an attached thread state at once, which means that the thread holds the :term:`GIL`. On free-threaded builds, there can be -infinitely many thread states attached, allowing for parallelism (because +an unbounded number of thread states attached, allowing for parallelism (because multiple threads can invoke the interpreter at once). With that in mind, attachment of thread states is a bit problematic in the C API. From 6f45d71ba7e0c61cfea4b0a2ca4a375a83c2d0e8 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Mon, 12 May 2025 20:44:38 -0400 Subject: [PATCH 30/31] Reword 'extremely common'. --- peps/pep-0788.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 726d79afc7f..5e0cf644480 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -268,8 +268,8 @@ and :c:func:`PyThreadState_Swap`, and :c:func:`PyGILState_Ensure`. The latter, *********************************************************** At the time of writing, the current behavior of :c:func:`PyGILState_Ensure` does not -match the documentation. Instead of hanging the thread during finalization -as previously noted, it's extremely common for it to crash with a segmentation +always match the documentation. Instead of hanging the thread during finalization +as previously noted, it's possible for it to crash with a segmentation fault. This is a `known issue `_ that could be fixed in CPython, but it's definitely worth noting here. Incidentally, acceptance and implementation of this PEP will likely fix From 1d41eb6d68c0278abde5eb776b9f1c55ef186c17 Mon Sep 17 00:00:00 2001 From: Peter Bierma Date: Mon, 12 May 2025 20:51:08 -0400 Subject: [PATCH 31/31] Use 'callback parameter' instead of 'closure'. --- peps/pep-0788.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 5e0cf644480..2a4a01c9df2 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -924,17 +924,17 @@ deadlock the interpreter if it's not released. Py_RETURN_NONE; } -Example: Calling Python Without a Closure -***************************************** +Example: Calling Python Without a Callback Parameter +**************************************************** -There are a few cases where callback functions don't take a closure +There are a few cases where callback functions don't take a callback parameter (``void *arg``), so it's impossible to acquire a reference to any specific interpreter. The solution to this problem is to acquire a reference to the main interpreter through :c:func:`PyInterpreterState_AsStrong`. But wait, won't that break with subinterpreters, per :ref:`pep-788-subinterpreters-gilstate`? Fortunately, since the callback has -no closure, it's not possible for the caller to pass any objects or +no callback parameter, it's not possible for the caller to pass any objects or interpreter-specific data, so it's completely safe to choose the main interpreter here.