From 64c88b37c9ae519b4775967142fb2ab5e6d98c93 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Thu, 1 May 2025 17:01:02 -0400
Subject: [PATCH 01/54] Clarify what 'native thread' means.

---
 peps/pep-0788.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 0f5b58e1804..a58a572b086 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -16,7 +16,8 @@ Abstract
 
 :c:func:`PyGILState_Ensure`, :c:func:`PyGILState_Release`, and other related
 functions in the ``PyGILState`` family are the most common way to create
-native threads that interact with Python. They have been the standard for over
+native threads (as in, created using the C API instead of :mod:`threading`)
+that interact with Python. They have been the standard for over
 twenty years (:pep:`311`). But, over time, these functions have
 become problematic:
 

From bda3db1ee8d788dcd7bd5ff922c829b9b14acbd4 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Thu, 1 May 2025 17:10:02 -0400
Subject: [PATCH 02/54] Add a section clarifying finalization and change up
 some wording.

---
 peps/pep-0788.rst | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index a58a572b086..d32233a9c68 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -85,6 +85,25 @@ like this:
         Py_RETURN_NONE;
     }
 
+What does "finalization" really mean?
+-------------------------------------
+
+Throughout this PEP, the phrase "finalization" is used in reference to the
+"finalizing" state of an interpreter. But, there's different stages of how
+Python shuts down, so there's some ambiguity in that term.
+
+There are two ways to "finalize" in the C API:
+
+1. :c:func:`Py_FinalizeEx`, which finalizes the main interpreter (and
+   subsequently the rest of the runtime).
+2. :c:func:`Py_EndInterpreter`, which finalizes a subinterpreter.
+   This does most of the same things that :c:func:`Py_FinalizeEx`
+   does to the main interpreter.
+
+So, "finalization" in this PEP refers to finalization of a specific
+interpreter, *not* the entire runtime. (But, keep in mind that finalization
+of the main interpreter and runtime are similiar states.)
+
 Motivation
 ==========
 
@@ -92,9 +111,10 @@ Native threads will always hang during finalization
 ---------------------------------------------------
 
 Many codebases might need to call Python code in highly-asynchronous
-situations where the interpreter is already finalizing, or might finalize, and
-want to continue running code after the Python call. This desire has been
-`brought up by users <https://discuss.python.org/t/78850/>`_.
+situations where the desired interpreter
+(:ref:`typically the main interpreter <pep-788-subinterpreters-gilstate>`)
+could be finalizing or deleted, but want to continue running code after the
+invoking the interpreter. This desire has been `brought up by users <https://discuss.python.org/t/78850/>`_.
 For example, a callback that wants to call Python code might be invoked when:
 
 - A kernel has finished running on a GPU.
@@ -102,7 +122,7 @@ For example, a callback that wants to call Python code might be invoked when:
 - A thread has quit, and a native library is executing static finalizers of
   thread local storage.
 
-In the current C API, any non-Python thread (one not created via the
+In the current C API, any "native" thread (one not created via the
 :mod:`threading` module) is considered to be "daemon", meaning that the interpreter
 won't wait on that thread to finalize. Instead, the interpreter will hang the
 thread when it goes to :term:`attach <attached thread state>` a :term:`thread state`,
@@ -233,6 +253,8 @@ ackwards-compatible by simply removing that limitation: threads still need a
 thread state (and thus need to call :c:func:`PyGILState_Ensure`), but they
 don't need to wait on one another to do so.
 
+.. _pep-788-subinterpreters-gilstate:
+
 Subinterpreters don't work with ``PyGILState_Ensure``
 -----------------------------------------------------
 

From a57686ccadc07598fd6a3ed8d2813cb176bb4c50 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sat, 3 May 2025 08:38:11 -0400
Subject: [PATCH 03/54] Rewrite the abstract.

---
 peps/pep-0788.rst | 131 +++++++++++++++-------------------------------
 1 file changed, 42 insertions(+), 89 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index d32233a9c68..17c45da6a71 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -14,95 +14,48 @@ Post-History: `10-Mar-2025 <https://discuss.python.org/t/83959>`__,
 Abstract
 ========
 
-:c:func:`PyGILState_Ensure`, :c:func:`PyGILState_Release`, and other related
-functions in the ``PyGILState`` family are the most common way to create
-native threads (as in, created using the C API instead of :mod:`threading`)
-that interact with Python. They have been the standard for over
-twenty years (:pep:`311`). But, over time, these functions have
-become problematic:
-
-- They aren't safe for finalization, either causing the calling thread to hang or
-  crashing it with a segmentation fault, preventing further execution.
-- When they're called before finalization, they force the thread to be
-  "daemon", meaning that an interpreter won't wait for it to reach any point
-  of execution. This is mostly frustrating for developers, but can lead to
-  deadlocks!
-- Subinterpreters don't play nicely with them, because they all assume that
-  the main interpreter is the only one that exists. A fresh thread (that is,
-  has never had a thread state) that calls :c:func:`PyGILState_Ensure` will
-  always be for the main interpreter.
-- The term "GIL" in the name is quite confusing for users of free-threaded
-  Python. There isn't a GIL, why do they still have to call it?
-
-This PEP intends to fix all of these issues by providing two new functions,
-:c:func:`PyThreadState_Ensure` and :c:func:`PyThreadState_Release`, as a more
-correct and safer replacement for :c:func:`PyGILState_Ensure` and
-:c:func:`PyGILState_Release`. For example:
-
-.. code-block:: c
-
-    if (PyThreadState_Ensure(interp) < 0) {
-        fputs("Python is shutting down", stderr);
-        return;
-    }
-
-    /* Interact with Python, without worrying about finalization. */
-    // ...
-
-    PyThreadState_Release();
-
-This is achieved by introducing two concepts into the C API:
-
--  "Daemon" and "non-daemon" threads, similar to how it works in the
-   :mod:`threading` module.
--  Interpreter reference counts which prevent an interpreter from finalizing.
-
-In :c:func:`PyThreadState_Ensure`, both of these ideas are applied. The
-calling thread is to store a reference to an interpreter via
-:c:func:`PyInterpreterState_Hold`. :c:func:`PyInterpreterState_Hold`
-increases the reference count of an interpreter, requiring the thread
-to finish (by eventually calling :c:func:`PyThreadState_Release`) before
-beginning finalization.
-
-For example, creating a native thread with this API would look something
-like this:
-
-.. code-block:: c
-
-    static PyObject *
-    my_method(PyObject *self, PyObject *unused)
-    {
-        PyThread_handle_t handle;
-        PyThead_indent_t indent;
-
-        PyInterpreterState *interp = PyInterpreterState_Hold();
-        if (PyThread_start_joinable_thread(thread_func, interp, &ident, &handle) < 0) {
-            PyInterpreterState_Release(interp);
-            return NULL;
-        }
-        /* The thread will always attach and finish, because we increased
-           the reference count of the interpreter. */
-        Py_RETURN_NONE;
-    }
-
-What does "finalization" really mean?
--------------------------------------
-
-Throughout this PEP, the phrase "finalization" is used in reference to the
-"finalizing" state of an interpreter. But, there's different stages of how
-Python shuts down, so there's some ambiguity in that term.
-
-There are two ways to "finalize" in the C API:
-
-1. :c:func:`Py_FinalizeEx`, which finalizes the main interpreter (and
-   subsequently the rest of the runtime).
-2. :c:func:`Py_EndInterpreter`, which finalizes a subinterpreter.
-   This does most of the same things that :c:func:`Py_FinalizeEx`
-   does to the main interpreter.
-
-So, "finalization" in this PEP refers to finalization of a specific
-interpreter, *not* the entire runtime. (But, keep in mind that finalization
-of the main interpreter and runtime are similiar states.)
+In Python, threads are able to interact with an interpreter (e.g., invoke the
+bytecode loop) through an :term:`attached thread state`. On with-GIL builds,
+only one thread can hold an attached thread state at once, which means that
+the thread holds the :term:`GIL`. On free-threaded builds, there can be
+infinitely many thread states attached, allowing for parallelism (because
+multiple threads can invoke the interpreter at once).
+
+With that in mind, attachment of thread states is a bit problematic in the C API.
+The C API currently provides two ways to acquire and attach a thread state for
+an interpreter:
+
+- :c:func:`PyGILState_Ensure` & :c:func:`PyGILState_Release`.
+- :c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap` (significantly
+  less common).
+
+The former, ``PyGILState``, are the most common way to do this and have been
+the standard for over twenty years (:pep:`311`), but have a number of issues
+that have arisen over time:
+
+- Subinterpreters tend to have trouble with them, because in threads that
+  haven't ever had an attached thread state, :c:func:`PyGILState_Ensure`
+  will assume that the main interpreter was requested. This makes it
+  impossible for the thread to interact with the subinterpreter!
+- The phrase "GIL" is confusing for developers of free-threaded
+  extensions, because there's no GIL there, right? Even on free-threaded
+  builds, threads still needs a thread state to interact with the interpreter,
+  it's just that they don't have to wait on one-another to do so. These days,
+  the important thing that :c:func:`PyGILState_Ensure` does is get attach a
+  thread state, and acquiring the GIL is somewhat incidental.
+
+The other option, :c:func:`PyThreadState_New` and :c:func:`PyThreadState_Swap`,
+do solve those issues, but come with an additional problem with how thread state
+attachment works in the C API (that ``PyGILState`` also includes): if the
+thread is not the main thread, then the interpreter will randomly hang the
+thread during attachment if it starts finalizing. This can be frustrating,
+especially if there was some additional work to be done alongside invoking
+Python.
+
+This PEP intends to solve these issues by providing :c:func:`PyThreadState_Ensure`
+and :c:func:`PyThreadState_Ensure` as replacements for the existing functions,
+accompanied by some interpreter reference counting APIs that let thread states
+be acquired and attached in a thread-safe and predictable manner.
 
 Motivation
 ==========

From 3387f814bfaed7f8ed254329f8fd5d0dd999e78b Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sat, 3 May 2025 09:32:29 -0400
Subject: [PATCH 04/54] A bunch of changes to the motivation and rationale.

---
 peps/pep-0788.rst | 211 +++++++++++++++++++++++++++-------------------
 1 file changed, 125 insertions(+), 86 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 17c45da6a71..8489d44c0e1 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -67,7 +67,8 @@ Many codebases might need to call Python code in highly-asynchronous
 situations where the desired interpreter
 (:ref:`typically the main interpreter <pep-788-subinterpreters-gilstate>`)
 could be finalizing or deleted, but want to continue running code after the
-invoking the interpreter. This desire has been `brought up by users <https://discuss.python.org/t/78850/>`_.
+invoking the interpreter. This desire has been
+`brought up by users <https://discuss.python.org/t/78850/>`_.
 For example, a callback that wants to call Python code might be invoked when:
 
 - A kernel has finished running on a GPU.
@@ -75,15 +76,33 @@ For example, a callback that wants to call Python code might be invoked when:
 - A thread has quit, and a native library is executing static finalizers of
   thread local storage.
 
+Generally, this pattern would look something like this:
+
+.. code-block:: c
+
+    static void
+    some_callback(void *closure)
+    {
+        /* Do some work */
+        /* ... */
+
+        PyGILState_STATE gstate = PyGILState_Ensure();
+        /* Invoke the C API to do some computation */
+        PyGILState_Release(gstate);
+
+        /* ... */
+    }
+
 In the current C API, any "native" thread (one not created via the
 :mod:`threading` module) is considered to be "daemon", meaning that the interpreter
 won't wait on that thread to finalize. Instead, the interpreter will hang the
 thread when it goes to :term:`attach <attached thread state>` a :term:`thread state`,
 making it unusable past that point. Attaching a thread state can happen at
-any point when invoking Python, such as releasing the GIL in-between bytecode
-instructions, or when a C function exits a :c:macro:`Py_BEGIN_ALLOW_THREADS`
-block. (Note that hanging the thread is relatively new behavior; in prior
-versions, the thread would terminate, but the issue is the same.)
+any point when invoking Python, such as releasing it in-between bytecode
+instructions (to yield the GIL), or when a C function exits a
+:c:macro:`Py_BEGIN_ALLOW_THREADS` block. (Note that hanging the thread is
+relatively new behavior; in prior versions, the thread would terminate, but
+the issue is the same.)
 
 This means that any non-Python thread may be terminated at any point, which
 is severely limiting for users who want to do more than just execute Python
@@ -105,8 +124,8 @@ the thread:
 
 Unfortunately, this isn't correct, because of time-of-call to time-of-use
 issues; the interpreter might not be finalizing during the call to
-:c:func:`Py_IsFinalizing`, but it might start finalizing immediately afterwards, which
-would cause the attachment of a thread state (typically via
+:c:func:`Py_IsFinalizing`, but it might start finalizing immediately
+afterwards, which would cause the attachment of a thread state (typically via
 :c:func:`PyGILState_Ensure`) to hang the thread.
 
 Daemon threads can cause finalization deadlocks
@@ -114,9 +133,16 @@ Daemon threads can cause finalization deadlocks
 
 When acquiring locks, it's extremely important to detach the thread state to
 prevent deadlocks. This is true on both the with-GIL and free-threaded builds.
+
 When the GIL is enabled, a deadlock can occur pretty easily when acquiring a
-lock if the GIL wasn't released, and lock-ordering deadlocks can still occur
-free-threaded builds if the thread state wasn't detached.
+lock if the GIL wasn't released; thread A grabs a lock, and starts waiting on
+its thread state to attach, while thread B holds the GIL and is waiting on the
+lock.
+
+On free-threaded builds, lock-ordering deadlocks are still possible
+if thread A acquired the lock for object A and then object B, and then
+another thread tried to acquire those locks in a reverse order. Free-threading
+protects against this by releasing locks when the thread state is detached.
 
 So, all code that needs to work with locks need to detach the thread state.
 In C, this is almost always done via :c:macro:`Py_BEGIN_ALLOW_THREADS` and
@@ -138,9 +164,7 @@ though. If any of those finalizers try to acquire the lock, deadlock ensues.
 This affects CPython itself, and there's not much that can be done
 to fix it. For example, `python/cpython#129536 <https://github.com/python/cpython/issues/129536>`_
 remarks that the :mod:`ssl` module can emit a fatal error when used at
-finalization, because a daemon thread got hung while holding the lock. There
-are workarounds for this for pure-Python code, but native threads don't have
-such an option.
+finalization, because a daemon thread got hung while holding the lock.
 
 .. _pep-788-hanging-compat:
 
@@ -148,12 +172,12 @@ We can't change finalization behavior for ``PyGILState_Ensure``
 ***************************************************************
 
 There will always have to be a point in a Python program where
-:c:func:`PyGILState_Ensure` can no longer acquire the GIL (or more correctly,
-attach a thread state). If the interpreter is long dead, then Python
-obviously can't give a thread a way to invoke it.
-:c:func:`PyGILState_Ensure` doesn't have any meaningful way to return a
-failure, so it has no choice but to terminate the thread or emit a fatal
-error, as noted in `python/cpython#124622 <https://github.com/python/cpython/issues/124622>`_:
+:c:func:`PyGILState_Ensure` can no longer attach a thread state.
+If the interpreter is long dead, then Python obviously can't give a
+thread a way to invoke it. :c:func:`PyGILState_Ensure` doesn't have any
+meaningful way to return a failure, so it has no choice but to terminate
+the thread or emit a fatal error, as noted in
+`python/cpython#124622 <https://github.com/python/cpython/issues/124622>`_:
 
     I think a new GIL acquisition and release C API would be needed. The way
     the existing ones get used in existing C code is not amenible to suddenly
@@ -163,9 +187,7 @@ error, as noted in `python/cpython#124622 <https://github.com/python/cpython/iss
     the GIL" without any other option.
 
 For this reason, we can't make any real changes to how :c:func:`PyGILState_Ensure`
-works for finalization, because it would break existing code. Similarly, threads
-created with the existing C API will have to remain daemon, because extensions
-that implement native threads aren't guaranteed to work during finalization.
+works during finalization, because it would break existing code.
 
 The existing APIs are broken and misleading
 -------------------------------------------
@@ -182,7 +204,7 @@ At the time of writing, the current behavior of :c:func:`PyGILState_Ensure` does
 match the documentation. Instead of hanging the thread during finalization
 as previously noted, it's extremely common for it to crash with a segmentation
 fault. This is a `known issue <https://github.com/python/cpython/issues/124619>`_
-that could, in theory, be fixed in CPython, but it's definitely worth noting
+that could be fixed in CPython, but it's definitely worth noting
 here. Incidentally, acceptance and implementation of this PEP will likely fix
 the existing crashes caused by :c:func:`PyGILState_Ensure`.
 
@@ -198,13 +220,29 @@ created by the authors of this PEP:
     erroneously call the C API inside ``Py_BEGIN_ALLOW_THREADS`` blocks or
     omit ``PyGILState_Ensure`` in fresh threads.
 
-Since Python 3.12, it is an :term:`attached thread state` that lets a thread
-invoke the C API. On with-GIL builds, holding an attached thread state
-implies holding the GIL, so only one thread can have one at a time. Free-threaded
-builds achieve the effect of multi-core parallism while remaining
-ackwards-compatible by simply removing that limitation: threads still need a
-thread state (and thus need to call :c:func:`PyGILState_Ensure`), but they
-don't need to wait on one another to do so.
+Again, :c:func:`PyGILState_Ensure` gets an :term:`attached thread state`
+for the thread on both with-GIL and free-threaded builds. Acquisition of the
+GIL on with-GIL builds is incidental! :c:func:`PyGILState_Ensure` is very
+roughly equivalent to the following:
+
+.. code-block:: c
+
+    PyGILState_STATE
+    PyGILState_Ensure(void)
+    {
+        PyThreadState *existing = PyThreadState_GetUnchecked();
+        if (existing == NULL) {
+            // Chooses the interpreter of the last attached thread state
+            // for this thread. If Python has never ran in this thread, the
+            // main interpreter is used.
+            PyInterpreterState *interp = guess_interpreter();
+            PyThreadState *tstate = PyThreadState_New(interp);
+            PyThreadState_Swap(tstate);
+            return opaque_tstate_handle(tstate);
+        } else {
+            return opaque_tstate_handle(existing);
+        }
+    }
 
 .. _pep-788-subinterpreters-gilstate:
 
@@ -220,13 +258,20 @@ As noted in the :ref:`documentation <python:gilstate>`,
     ``Py_NewInterpreter()``), but mixing multiple interpreters and the
     ``PyGILState_*`` API is unsupported.
 
-More technically, this is because ``PyGILState_Ensure`` doesn't have any way
+This is because :c:func:`PyGILState_Ensure` doesn't have any way
 to know which interpreter created the thread, and as such, it has to assume
 that it was the main interpreter. There isn't any way to detect this at
 runtime, so spurious races are bound to come up in threads created by
 subinterpreters, because synchronization for the wrong interpreter will be
 used on objects shared between the threads.
 
+For example, if the thread had access to object A, which belongs to a
+subinterpreter, but then called :c:func:`PyGILState_Ensure` would have an
+attached thread state pointing to the main interpreter, not the subinterpreter.
+This means that any GIL assumptions about the object are wrong! There isn't
+any synchronization between the two GILs, so both the thread (who thinks it's
+in the subinterpreter) and the main thread could try to increment the
+reference count at the same time, causing a data race!
 
 Interpreters can concurrently shut down
 ***************************************
@@ -234,22 +279,61 @@ Interpreters can concurrently shut down
 The other way of creating a native thread that can invoke Python,
 :c:func:`PyThreadState_New` / :c:func:`PyThreadState_Swap`, is a lot better
 for supporting subinterpreters (because :c:func:`PyThreadState_New` takes an
-explicit interpreter, rather than assuming that the main interpreter was intended),
-but is still limited by the current API.
+explicit interpreter, rather than assuming that the main interpreter was
+requested), but is still limited by the current hanging problems in the C API.
 
-In particular, subinterpreters typically have a much shorter lifetime than the
-main interpreter, and as such, there's not necessarily a guarantee that a
-:c:type:`PyInterpreterState` (acquired by :c:func:`PyInterpreterState_Get`)
-passed to a fresh thread will still be alive. Similarly, a
-:c:type:`PyInterpreterState` pointer could have been replaced with a *new*
-interpreter, causing all sorts of unknown issues. They are also subject to
-all the finalization related hanging mentioned previously.
+In addition, subinterpreters typically have a much shorter lifetime than the
+main interpreter, so there's a much higher chance that an interpreter passed
+to a thread will have already finished and have been deallocated. Passing that
+interpreter to :c:func:`PyThreadState_New` will most likely crash the program.
 
 Rationale
 =========
 
-This PEP includes several new APIs that intend to fix all of the issues stated
-above.
+So, how do we address all of this? The best way seems to be starting from
+scratch and "reimagining" how to acquire and attach thread states in the C API.
+
+As a summary, there's a few bases we want to cover in a new API:
+
+- Require the caller to specify which interpreter they want to prevent those
+  pesky problems with interpreter guessing.
+- Prevent the thread from being arbitrarily bricked by calling into Python.
+- Protection against deallocation on interpreters with short lifetimes.
+- Backwards-compatibility with the old APIs and ideas, such as "daemonness"
+  (but as opt-in).
+
+Preventing interpreter finalization with references
+---------------------------------------------------
+
+This PEP takes an approach where interpreters are given a reference count by
+non-daemon threads that want to (or do) hold an attached thread state. When
+the interpreter starts finalizing, it will until its reference count
+reaches zero before proceeding to a point where threads will be hung.
+Note that this *is not* the same as joining the thread; the interpreter will
+only wait until the thread state has been released
+(via :c:func:`PyThreadState_Release`) for all non-daemon threads. This isn't
+the same as waiting for them to detach their thread state--it waits for them
+to *destroy* it. Otherwise, this API wouldn't have any finalization benefits
+over the existing ``PyThreadState`` functions.
+
+So, from a thread's perspective, holding a "strong reference" to the
+interpreter will effectively prevent it from finalizing, making it safe to
+invoke Python without worrying about the thread being hung. The strong
+reference will be held as long as thread state is "alive", even if it's
+detached.
+
+This proposal also comes with weak references to an interpreter that don't
+prevent it from finalizing, but can be promoted to a strong reference once
+decided that a thread state can attach. Promotion of a weak reference to a
+strong reference can fail if the interpreter has already finalized, or reached
+a point during finalization where it can't be guaranteed that the thread won't
+hang.
+
+If there's additional work after destroying the thread state, the thread
+can continue running as normal. If that work needs to finish before the
+program exits, it's still up to the user on how to join the thread, for
+example by using an :mod:`atexit` handler can be used to join the thread.
+Again, this PEP isn't trying to reinvent how to create or join threads!
 
 Replacing the old APIs
 ----------------------
@@ -272,51 +356,6 @@ seamless, due to the new requirement of storing an interpreter state. The
 exact details of this deprecation are currently unclear, see
 :ref:`pep-788-deprecation`.
 
-A light layer of magic
-----------------------
-
-The APIs proposed by this PEP intentionally have a layer of abstraction that is
-hidden from the user and offloads complexity onto CPython. This is done
-primarily to help ease the transition from ``PyGILState`` for existing
-codebases, and for ease-of-use to those who provide wrappers the C API, such
-as Cython or PyO3.
-
-In particular, the API hides details about the lifetime of the thread state
-and most of the details with interpreter references.
-
-See also :ref:`pep-788-activate-deactivate-instead`.
-
-Bikeshedding and the ``PyThreadState`` namespace
-------------------------------------------------
-
-To solve the issue with "GIL" terminology, the new functions described by this
-PEP intended as replacements for ``PyGILState`` will go under the existing
-``PyThreadState`` namespace. In Python 3.14, the documentation has been
-updated to switch over to terms like
-:term:`"attached thread state" <attached thread state>` instead of
-:term:`"global interpreter lock" <global interpreter lock>`, so this namespace
-seems to fit well for this PEP.
-
-Preventing interpreter finalization with references
----------------------------------------------------
-
-Several iterations of this API have taken an approach where
-:c:func:`PyThreadState_Ensure` can return a failure based on the state of
-the interpreter. Instead, this PEP takes an approach where an interpreter
-keeps track of the number of non-daemon threads, which inherently prevents
-it from beginning finalization.
-
-The main upside with this approach is that there's more consistency with
-attaching threads. Using an interpreter reference from the calling thread
-keeps the interpreter from finalizing before the thread starts, ensuring
-that it always works. An approach that were to return a failure based on
-the start-time of the thread could cause spurious issues.
-
-In the case where it is useful to let the interpreter finalize, such as in
-an asynchronous callback where there's no guarantee that the thread will start,
-strong references to an interpreter can be acquired through
-:c:func:`PyInterpreterState_Lookup`.
-
 Specification
 =============
 

From ceeefeaf5f9f4bd5a726e27c18f1afa94801cb95 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sat, 3 May 2025 09:40:48 -0400
Subject: [PATCH 05/54] Add PyThreadState_GetDaemon() and reword the
 deprecation rationale.

---
 peps/pep-0788.rst | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 8489d44c0e1..b244fd1e887 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -338,12 +338,10 @@ Again, this PEP isn't trying to reinvent how to create or join threads!
 Replacing the old APIs
 ----------------------
 
-As made clear in Motivation_, ``PyGILState`` is already pretty buggy, and
-even if it was magically fixed, the current behavior of hanging the thread is
-beyond repair. In turn, this PEP intends to completely deprecate the existing
-``PyGILState`` APIs and provide better alternatives. However, even if this PEP
-is rejected, all of the APIs can be replaced with more correct ``PyThreadState``
-functions in the current C API:
+Due to the plethora of issues with ``PyGILState``, this PEP intends to do away
+with them entirely. In today's C API, all ``PyGILState`` functions are
+replaceable with ``PyThreadState`` counterparts that are compatibile with
+subinterpreters:
 
 - :c:func:`PyGILState_Ensure`: :c:func:`PyThreadState_Swap` & :c:func:`PyThreadState_New`
 - :c:func:`PyGILState_Release`: :c:func:`PyThreadState_Clear` & :c:func:`PyThreadState_Delete`
@@ -351,10 +349,12 @@ functions in the current C API:
 - :c:func:`PyGILState_Check`: ``PyThreadState_GetUnchecked() != NULL``
 
 This PEP specifies a ten-year deprecation for these functions (while remaining
-in the stable ABI), primarily because it's expected that the migration won't be
-seamless, due to the new requirement of storing an interpreter state. The
-exact details of this deprecation are currently unclear, see
-:ref:`pep-788-deprecation`.
+in the stable ABI), mainly because it's expected that the migration will be a
+little painful, because :c:func:`PyThreadState_Ensure` and
+:c:func:`PyThreadState_Release` aren't drop-in replacements for
+:c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`, due to the
+requirement of a specific interpreter. The exact details of this deprecation
+aren't too clear, see :ref:`pep-788-deprecation`.
 
 Specification
 =============
@@ -391,6 +391,15 @@ See :ref:`pep-788-hanging-compat`.
 
     Return zero on success, non-zero *without* an exception set on failure.
 
+.. c:function:: int PyThreadState_GetDaemon(int is_daemon)
+
+    Returns non-zero if the :term:`attached thread state` is daemon,
+    and zero otherwise. See also and :c:func:`PyThreadState_SetDaemon`
+    and :attr:`threading.Thread.daemon`.
+
+    This function cannot fail, other than with a fatal error if the caller
+    has no :term:`attached thread state`.
+
 Interpreter reference counting
 ------------------------------
 

From 3cbfb261c385c4cf0f87deabf1eeb34afb66ed69 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sat, 3 May 2025 11:21:30 -0400
Subject: [PATCH 06/54] Rewrite the entire damn specification.

---
 peps/pep-0788.rst | 322 +++++++++++++++++++++++++++-------------------
 1 file changed, 187 insertions(+), 135 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index b244fd1e887..d68f729bac0 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -359,19 +359,115 @@ aren't too clear, see :ref:`pep-788-deprecation`.
 Specification
 =============
 
+Interpreter reference counting
+------------------------------
+
+An interpreter will keep track of the number of non-daemon threads through
+a reference count. During finalization, the interpreter will wait until its
+reference count reaches zero, and once that happens, threads can no longer
+acquire a strong reference to the interpreter. Threads can hold as many
+references as they want, but in most cases, a thread will have one reference
+at a time, typically through the :term:`attached thread state`.
+
+An attached thread state is made non-daemon by holding a strong reference
+to the interpreter. When a non-daemon thread state is destroyed, it releases
+the reference.
+
+A weak reference to the interpreter won't prevent it from finalizing, but can
+be safely accessed after the interpreter no longer supports strong references,
+and even after the interpreter has been deleted. But, at that point, the weak
+reference can no longer be converted to a strong reference.
+
+Strong interpreter references
+*****************************
+
+.. c:type:: PyInterpreterRef
+
+   An opaque, strong reference to an interpreter.
+   The interpreter will wait until a strong reference has been released
+   before shutting down.
+
+.. c:function:: PyInterpreterRef PyInterpreterRef_Get(void)
+
+    Acquire a strong reference to the current interpreter.
+
+    This function is generally meant to be used in tandem with
+    :c:func:`PyThreadState_Ensure`.
+
+    This function cannot fail, other than with a fatal error when the caller
+    doesn't hold an :term:`attached thread state`.
+
+.. c:function:: PyInterpreterRef PyInterpreterRef_Dup(PyInterpreterRef ref)
+
+    Duplicate a strong reference to an interpreter.
+
+    This function is generally meant to be used in tandem with
+    :c:func:`PyThreadState_Ensure`.
+
+    This function cannot fail, and the caller doesn't need to hold an
+    :term:`attached thread state`.
+
+.. c:function:: void PyInterpreterRef_Close(PyInterpreterRef ref)
+
+    Release a strong reference to an interpreter, allowing it to shut down
+    if there are no references left.
+
+    This function cannot fail, and the caller doesn't need to hold an
+    :term:`attached thread state`.
+
+Weak interpreter references
+***************************
+
+.. c:type:: PyInterpreterWeakRef
+
+    An opaque, weak reference to an interpreter.
+    The interpreter will *not* wait for the reference to be
+    released before shutting down.
+
+.. c:function:: PyInterpreterWeakRef *PyInterpreterWeakRef_Get(void)
+
+    Acquire a weak reference to the current interpreter.
+
+    This function is generally meant to be used in tandem with
+    :c:func:`PyInterpreterWeakRef_AsStrong`.
+
+    This function cannot fail, other than with a fatal error when the caller
+    doesn't hold an :term:`attached thread state`.
+
+.. c:function:: PyInterpreterWeakRef *PyInterpreterWeakRef_Dup(PyInterpreterWeakRef wref)
+
+    Duplicate a weak reference to *wref*.
+
+    This function is generally meant to be used in tandem with
+    :c:func:`PyInterpreterWeakRef_AsStrong`.
+
+    This function cannot fail, and the caller doesn't need to hold an
+    :term:`attached thread state`.
+
+.. c:function:: PyInterpreterRef PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef *wref)
+
+    Return a strong reference to an interpreter from a weak reference.
+
+    If the interpreter no longer exists or has already finished waiting for
+    non-daemon threads, then this function returns ``NULL``.
+
+    The caller does not need to hold an :term:`attached thread state`, but is
+    not safe to call in a re-entrant signal handler.
+
+.. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef *wref)
+
+    Release a weak reference, possibly deallocating it.
+
+    This function cannot fail, and the caller doesn't need to hold an
+    :term:`attached thread state`.
+
 Daemon and non-daemon threads
 -----------------------------
 
-This PEP introduces the concept of non-daemon thread states. By default, all
-threads created without the :mod:`threading` module will hang when trying to
-attach a thread state for a finalizing interpreter (in fact, daemon threads
-that *are* created with the :mod:`threading` module will hang in the same
-way). This generally happens when a thread calls :c:func:`PyEval_RestoreThread`
-or in between bytecode instructions, based on :func:`sys.setswitchinterval`.
-
-A new, internal field will be added to the ``PyThreadState`` structure that
-determines if the thread is daemon. Before finalization, an interpreter
-will wait until all non-daemon threads call :c:func:`PyThreadState_Delete`.
+A non-daemon thread state is a thread state that holds a strong reference to an
+interpreter. The reference is released when the thread state is deleted, either
+by :c:func:`PyThreadState_Release` or a different thread state deletion
+function.
 
 For backwards compatibility, all thread states created by existing APIs,
 including :c:func:`PyGILState_Ensure`, will remain daemon by default.
@@ -386,10 +482,12 @@ See :ref:`pep-788-hanging-compat`.
     :c:func:`PyThreadState_Ensure` are daemon by default.
 
     If the thread state is non-daemon, then the current interpreter will wait
-    for this thread to finish before shutting down. See also
+    for this thread to finish before shutting down by holding a strong
+    reference to the interpreter (see :c:func:`PyInterpreterRef_Get`). See also
     :attr:`threading.Thread.daemon`.
 
     Return zero on success, non-zero *without* an exception set on failure.
+    This function can only fail when setting the thread state to non-daemon.
 
 .. c:function:: int PyThreadState_GetDaemon(int is_daemon)
 
@@ -400,102 +498,77 @@ See :ref:`pep-788-hanging-compat`.
     This function cannot fail, other than with a fatal error if the caller
     has no :term:`attached thread state`.
 
-Interpreter reference counting
-------------------------------
-
-Internally, an interpreter will have to keep track of the number of
-non-daemon native threads, which will determine when an interpreter can
-finalize. This is done to prevent use-after-free crashes in
-:c:func:`PyThreadState_Ensure` for interpreters with short lifetimes, and
-to remove needless layers of synchronization between the calling thread and
-the started thread.
-
-An interpreter state returned by :c:func:`Py_NewInterpreter` (or really,
-:c:func:`PyInterpreterState_New`) will start with a native thread countdown.
-For simplicity's sake, this will be referred to as a reference count.
-A non-zero reference count prevents the interpreter from finalizing.
-
-.. c:function:: PyInterpreterState *PyInterpreterState_Hold(void)
-
-    Similar to :c:func:`PyInterpreterState_Get`, but returns a strong
-    reference to the interpreter (meaning, it has its reference count
-    incremented by one, allowing the returned interpreter state to be safely
-    accessed by another thread, because it will be prevented from finalizing).
-
-    This function is generally meant to be used in tandem with
-    :c:func:`PyThreadState_Ensure`.
-
-    The caller must have an :term:`attached thread state`. This function
-    cannot return ``NULL``. Failures are always a fatal error.
-
-.. c:function:: PyInterpreterState *PyInterpreterState_Lookup(int64_t interp_id)
-
-    Similar to :c:func:`PyInterpreterState_Hold`, but looks up an interpreter
-    based on an ID (see :c:func:`PyInterpreterState_GetID`). This has the
-    benefit of allowing the interpreter to finalize in cases where the thread
-    might not start, such as inside of an asynchronous callback.
-
-    This function will return ``NULL`` without an exception set on failure.
-    If the return value is non-``NULL``, then the returned interpreter will be
-    prevented from finalizing until the reference is released by
-    :c:func:`PyThreadState_Release` or :c:func:`PyInterpreterState_Release`.
-
-    Returning ``NULL`` typically means that the interpreter is at a point
-    where threads cannot start, or no longer exists.
-
-    The caller does not need to have an :term:`attached thread state`.
-
-.. c:function:: void PyInterpreterState_Release(PyInterpreterState *interp)
-
-    Decrement the reference count of the interpreter, as was incremented by
-    :c:func:`PyInterpreterState_Hold` or :c:func:`PyInterpreterState_Lookup`.
-
-    This function cannot fail, other than with a fatal error. The caller does
-    not need to have an :term:`attached thread state` for *interp*.
-
 Ensuring and releasing thread states
 ------------------------------------
 
 This proposal includes two new high-level threading APIs that intend to
 replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`.
 
-.. c:function:: int PyThreadState_Ensure(PyInterpreterState *interp)
+.. c:function:: int PyThreadState_Ensure(PyInterpreterRef ref)
+
+    Ensure that the thread has an :term:`attached thread state` for the
+    interpreter denoted by *ref*, and thus can safely invoke that
+    interpreter. It is OK to call this function if the thread already has an
+    attached thread state, as long as there is a subsequent call to
+    :c:func:`PyThreadState_Release` that matches this one.
 
-    Ensure that the thread has an :term:`attached thread state` for *interp*,
-    and thus can safely invoke that interpreter. It is OK to call this
-    function if the thread already has an attached thread state, as long as
-    there is a subsequent call to :c:func:`PyThreadState_Release` that matches
-    this one.
+    Nested calls to this function will only sometimes create a new
+    :term:`thread state`. If there is no :term:`attached thread state`,
+    then this function will check for the most recent attached thread
+    state used by this thread. If none exists or it doesn't match *ref*,
+    a new thread state is created. If it does match *ref*, it is reattached.
+    If there is an :term:`attached thread state`, then a similar check occurs;
+    if the interpreter matches *ref*, it is attached, and otherwise a new
+    thread state is created.
 
-    The reference to the interpreter *interp* is stolen by this function.
-    As such, *interp* should have been acquired by
-    :c:func:`PyInterpreterState_Hold`.
+    The thread state attached by this function will be reused by
+    subsequent calls to :c:func:`PyGILState_Ensure` in this thread, but
+    :c:func:`PyGILState_Ensure` will *not* make the thread daemon again.
 
-    Thread states created by this function are non-daemon by default. See
-    :c:func:`PyThreadState_SetDaemon`. If the calling thread already has an
-    attached thread state that matches *interp*, then this function
-    will mark the existing thread state as non-daemon and return. It will
-    be restored to its prior daemon status upon the next
-    :c:func:`PyThreadState_Release` call.
+    The reference to the interpreter *ref* is stolen by this function.
+    Use :c:func:`PyInterpreterRef_Dup` if the reference is intended to be
+    kept.
 
     Return zero on success, and non-zero with the old attached thread state
     restored (which may have been ``NULL``).
 
 .. c:function:: void PyThreadState_Release()
 
-    Release the :term:`attached thread state` set by
-    :c:func:`PyThreadState_Ensure`. Any thread state that was set prior
-    to the original call to :c:func:`PyThreadState_Ensure` will be restored.
+    Release a :c:func:`PyThreadState_Ensure` call.
+
+    The :term:`attached thread state` prior to the corresponding
+    :c:func:`PyThreadState_Ensure` call is guaranteed to be restored upon
+    returning. The cached thread state as used by :c:func:`PyThreadState_Ensure`
+    and :c:func:`PyGILState_Ensure` will also be restored.
 
     This function cannot fail, but may hang the thread if the
-    attached thread state prior to the original :c:func:`!PyThreadState_Ensure`
-    was daemon and the interpreter was finalized.
+    restored :term:`attached thread state` was daemon and the interpreter
+    was finalized. If you're running in a thread where that could be an issue,
+    call :c:func:`PyThreadState_SetDaemon` before :c:func:`PyThreadState_Ensure`
+    at your own discretion.
+
+Changes to :mod:`threading` shutdown
+------------------------------------
+
+An interpreter currently special-cases non-daemon threads created by
+:mod:`threading` and joins them before the interpreter does any other
+finalization.
+
+:mod:`threading` will be changed to use :c:func:`PyThreadState_Ensure`, and
+will rely on the interpreter's strong reference to run until completion.
+:mod:`threading`-created threads will still be joined to release resources after
+this has happened.
+
+Additionally, setting a :class:`threading.Thread` to :attr:`~threading.Thread.daemon`
+should correspond to calling :c:func:`PyThreadState_SetDaemon` in C. Otherwise,
+:c:func:`PyThreadState_GetDaemon` will have incorrect results in Python
+threads.
 
 Deprecation of ``PyGILState`` APIs
 ----------------------------------
 
 This PEP deprecates all of the existing ``PyGILState`` APIs in favor of the
-new ``PyThreadState`` APIs for the reasons given in the Motivation_. Namely:
+existing and new ``PyThreadState`` APIs. Namely:
 
 - :c:func:`PyGILState_Ensure`: use :c:func:`PyThreadState_Ensure` instead.
 - :c:func:`PyGILState_Release`: use :c:func:`PyThreadState_Release` instead.
@@ -548,12 +621,11 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked!
     my_critical_operation(PyObject *self, PyObject *unused)
     {
         assert(PyThreadState_GetUnchecked() != NULL);
-        PyInterpreterState *interp = PyInterpreterState_Hold();
+        PyInterpreterRef ref = PyInterpreterRef_Get();
         /* Temporarily make this thread non-daemon to ensure that the
            lock is released. */
-        if (PyThreadState_Ensure(interp) < 0) {
-            PyErr_SetString(PyExc_PythonFinalizationError,
-                            "interpreter is shutting down");
+        if (PyThreadState_Ensure(ref) < 0) {
+            PyErr_NoMemory();
             return NULL;
         }
 
@@ -561,7 +633,8 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked!
         acquire_some_lock();
         Py_END_ALLOW_THREADS;
 
-        /* Do something while holding the lock */
+        /* Do something while holding the lock.
+           The interpreter won't finalize during this period. */
         // ...
 
         release_some_lock();
@@ -569,10 +642,10 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked!
         Py_RETURN_NONE;
     }
 
-Transitioning from old functions
-********************************
+Transitioning from the existing functions
+*****************************************
 
-The following code uses the old ``PyGILState`` APIs:
+The following code uses the ``PyGILState`` APIs:
 
 .. code-block:: c
 
@@ -606,16 +679,15 @@ The following code uses the old ``PyGILState`` APIs:
         Py_RETURN_NONE;
     }
 
-This is the same code, updated to use the new functions:
+This is the same code, rewritten to use the new functions:
 
 .. code-block:: c
 
     static int
     thread_func(void *arg)
     {
-        PyInterpreterState *interp = (PyInterpreterState *)arg;
+        PyInterpreterRefinterp = (PyInterpreterRef)arg;
         if (PyThreadState_Ensure(interp) < 0) {
-            fputs("Cannot talk to Python", stderr);
             return -1;
         }
         if (PyRun_SimpleString("print(42)") < 0) {
@@ -631,9 +703,9 @@ This is the same code, updated to use the new functions:
         PyThread_handle_t handle;
         PyThead_indent_t indent;
 
-        PyInterpreterState *interp = PyInterpreterState_Hold();
-        if (PyThread_start_joinable_thread(thread_func, interp, &ident, &handle) < 0) {
-            PyInterpreterState_Release(interp);
+        PyInterpreterRef ref = PyInterpreterRef_Get();
+        if (PyThread_start_joinable_thread(thread_func, (void *)ref, &ident, &handle) < 0) {
+            PyInterpreterRef_Close(ref);
             return NULL;
         }
         Py_BEGIN_ALLOW_THREADS
@@ -654,9 +726,8 @@ they can still be used with this API:
     static int
     thread_func(void *arg)
     {
-        PyInterpreterState *interp = (PyInterpreterState *)arg;
-        if (PyThreadState_Ensure(interp) < 0) {
-            fputs("Cannot talk to Python", stderr);
+        PyInterpreterRef ref = (PyInterpreterRef)arg;
+        if (PyThreadState_Ensure(ref) < 0) {
             return -1;
         }
         (void)PyThreadState_SetDaemon(1);
@@ -673,9 +744,9 @@ they can still be used with this API:
         PyThread_handle_t handle;
         PyThead_indent_t indent;
 
-        PyInterpreterState *interp = PyInterpreterState_Hold();
-        if (PyThread_start_joinable_thread(thread_func, interp, &ident, &handle) < 0) {
-            PyInterpreterState_Release(interp);
+        PyInterpreterRef ref = PyInterpreterRef_Get();
+        if (PyThread_start_joinable_thread(thread_func, (void *)ref, &ident, &handle) < 0) {
+            PyInterpreterRef_Close(ref);
             return NULL;
         }
         Py_RETURN_NONE;
@@ -684,35 +755,23 @@ they can still be used with this API:
 Asynchronous callback example
 *****************************
 
-As stated in the Motivation_, there are many cases where it's desirable
-to call Python in an asynchronous callback. In such cases, it's not safe to
-call :c:func:`PyInterpreterState_Hold`, because it's not guaranteed that
-:c:func:`PyThreadState_Ensure` will ever be called.
-If not, finalization becomes deadlocked.
-
-This scenario requires using :c:func:`PyInterpreterState_Lookup` instead,
-which only prevents finalization once the lookup has been made.
-
-For example:
+In some cases, the thread might not ever start, such as in a callback.
+We can't use a strong reference here, because a strong reference would
+deadlock the interpreter if it's not released.
 
 .. code-block:: c
 
-    typedef struct {
-        int64_t interp_id;
-    } pyrun_t;
-
     static int
     async_callback(void *arg)
     {
-        pyrun_t *data = (pyrun_t *)arg;
-        PyInterpreterState *interp = PyInterpreterState_Lookup(data->interp_id);
-        PyMem_RawFree(data);
-        if (interp == NULL) {
-            fputs("Python has shut down", stderr);
+        PyInterpreterWeakRef *wref = (PyInterpreterWeakRef *)arg;
+        PyInterpreterRef *ref = PyInterpreterWeakRef_AsStrong(wref);
+        if (ref == NULL) {
+            fputs(stderr, "Python has shut down!");
             return -1;
         }
-        if (PyThreadState_Ensure(interp) < 0) {
-            fputs("Cannot talk to Python", stderr);
+
+        if (PyThreadState_Ensure(ref) < 0) {
             return -1;
         }
         if (PyRun_SimpleString("print(42)") < 0) {
@@ -725,17 +784,10 @@ For example:
     static PyObject *
     setup_callback(PyObject *self, PyObject *unused)
     {
-        PyThread_handle_t handle;
-        PyThead_indent_t indent;
-
-        pyrun_t *data = PyMem_RawMalloc(sizeof(pyrun_t));
-        if (data == NULL) {
-            return PyErr_NoMemory();
-        }
         // Weak reference to the interpreter. It won't wait on the callback
         // to finalize.
-        data->interp_id = PyInterpreterState_GetID(PyInterpreterState_Get());
-        register_callback(async_callback, data);
+        PyInterpreterWeakRef *wref = PyInterpreterWeakRef_Get();
+        register_callback(async_callback, wref);
 
         Py_RETURN_NONE;
     }

From d9de49a3b1bd65b9a1fa4dae7511580f0a394772 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sat, 3 May 2025 11:39:47 -0400
Subject: [PATCH 07/54] Update the rejected ideas.

---
 peps/pep-0788.rst | 53 ++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index d68f729bac0..022a667d0a2 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -454,6 +454,10 @@ Weak interpreter references
     The caller does not need to hold an :term:`attached thread state`, but is
     not safe to call in a re-entrant signal handler.
 
+    If the caller *does* hold an :term:`attached thread state`, and that thread
+    state holds a strong reference to the interpreter, then this function can
+    never fail.
+
 .. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef *wref)
 
     Release a weak reference, possibly deallocating it.
@@ -796,13 +800,30 @@ Reference Implementation
 ========================
 
 A reference implementation of this PEP can be found
-`here <https://github.com/ZeroIntensity/cpython/tree/pep-788-impl>`_.
+at `python/cpython#133110 <https://github.com/python/cpython/pull/133110>`_.
 
 Rejected Ideas
 ==============
 
-Using an interpreter ID instead of a interpreter state for ``PyThreadState_Ensure``
------------------------------------------------------------------------------------
+Retrofiting the existing structures with reference counts
+---------------------------------------------------------
+
+Using interpreter state pointers for reference counting
+*******************************************************
+
+Originally, this PEP specified :c:func:`!PyInterpreterState_Hold`
+and :c:func:`!PyInterpreterState_Release` for managing strong references
+to an interpreter, alongside :c:func:`!PyInterpreterState_Lookup` which
+converted interpreter IDs (weak references) to strong references.
+
+In the end, this was rejected, primarily because it was needlessly
+confusing. Interpreter states hadn't ever had a reference count prior, so
+there was a lack of intuition about when and where something was a strong
+reference. The ``PyInterpreterRef`` and ``PyInterpreterWeakRef`` seem a lot
+clearer.
+
+Using interpreter IDs for reference counting
+********************************************
 
 Some iterations of this API took an ``int64_t interp_id`` parameter instead of
 ``PyInterpreterState *interp``, because interpreter IDs cannot be concurrently
@@ -813,10 +834,7 @@ requiring less magic in the implementation, but has several downsides:
 -  Nearly all existing interpreter APIs already return a :c:type:`PyInterpreterState`
    pointer, not an interpreter ID. Functions like
    :c:func:`PyThreadState_GetInterpreter` would have to be accompanied by
-   frustrating calls to :c:func:`PyInterpreterState_GetID`. There's also
-   no existing way to go from an ``int64_t`` back to a
-   :c:expr:`PyInterpreterState *`, and providing such an API would come
-   with its own set of design problems.
+   frustrating calls to :c:func:`PyInterpreterState_GetID`.
 -  Threads typically take a ``void *arg`` parameter, not an ``int64_t arg``.
    As such, passing an interpreter pointer requires much less boilerplate
    for the user, because an additional structure definition or heap allocation
@@ -829,9 +847,7 @@ requiring less magic in the implementation, but has several downsides:
    must be tracked elsewhere in the interpreter, likely being *more*
    complex than :c:func:`PyInterpreterState_Hold`. There's also a lack
    of intuition that a standalone integer could have such a thing as
-   a reference count. :c:func:`PyInterpreterState_Lookup` sidesteps this
-   problem because the reference count is always associated with the returned
-   interpreter state, not the integer ID.
+   a reference count.
 
 .. _pep-788-activate-deactivate-instead:
 
@@ -893,6 +909,23 @@ In addition, it's unclear whether to remove them at all. A
 functions if it's determined that a full ``PyGILState`` removal would
 be too disruptive for the ecosystem.
 
+Should ``PyThreadState_Ensure`` steal a reference?
+--------------------------------------------------
+
+At the moment, :c:func:`PyThreadState_Ensure` steals a reference to the
+interpreter. This is controversial, because it's not necessarily the right
+default.
+
+For now, it's staing because in cases where a reference is supposed
+to be multi-use, :c:func:`PyInterpreterRef_Dup` can be used to make up
+for the stolen reference. If it didn't still a reference, there's no
+opposite helper function to throw away the reference, so it's just more
+boilerplate. But, this is based on the assumption that there is a general
+desire for single-use interpreter references. If this doesn't prove to be
+the case, and a multi-use reference is overwhelmingly more common, then it
+seems reasonable to let :c:func:`PyThreadState_Ensure` form its own reference
+from the one passed to it.
+
 Copyright
 =========
 

From c742d933a8b84bdea8992c825dd9d4b05edbda16 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sat, 3 May 2025 11:44:43 -0400
Subject: [PATCH 08/54] Fix some outdated references.

---
 peps/pep-0788.rst | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 022a667d0a2..6791b3e3aae 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -827,16 +827,16 @@ Using interpreter IDs for reference counting
 
 Some iterations of this API took an ``int64_t interp_id`` parameter instead of
 ``PyInterpreterState *interp``, because interpreter IDs cannot be concurrently
-deleted and cause use-after-free violations. :c:func:`PyInterpreterState_Hold`
-fixes this issue anyway, but an interpreter ID does have the benefit of
-requiring less magic in the implementation, but has several downsides:
+deleted and cause use-after-free violations. The reference counting APIs in
+this PEP sidestep this issue anyway, but an interpreter ID have the advantage
+of requiring less magic:
 
 -  Nearly all existing interpreter APIs already return a :c:type:`PyInterpreterState`
    pointer, not an interpreter ID. Functions like
    :c:func:`PyThreadState_GetInterpreter` would have to be accompanied by
    frustrating calls to :c:func:`PyInterpreterState_GetID`.
 -  Threads typically take a ``void *arg`` parameter, not an ``int64_t arg``.
-   As such, passing an interpreter pointer requires much less boilerplate
+   As such, passing a reference requires much less boilerplate
    for the user, because an additional structure definition or heap allocation
    would be needed to store the interpreter ID. This is especially an issue
    on 32-bit systems, where ``void *`` is too small for an ``int64_t``.
@@ -845,7 +845,7 @@ requiring less magic in the implementation, but has several downsides:
    the native thread gets a chance to attach. The problem with using an
    interpreter ID is that the reference count has to be "invisible"; it
    must be tracked elsewhere in the interpreter, likely being *more*
-   complex than :c:func:`PyInterpreterState_Hold`. There's also a lack
+   complex than :c:func:`PyInterpreterRef_Get`. There's also a lack
    of intuition that a standalone integer could have such a thing as
    a reference count.
 

From ad1bf7f1acf4694f8003cca653574ed9b598b57b Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 4 May 2025 09:06:20 -0400
Subject: [PATCH 09/54] Fix typo in rejected ideas.

---
 peps/pep-0788.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 6791b3e3aae..69632511936 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -916,7 +916,7 @@ At the moment, :c:func:`PyThreadState_Ensure` steals a reference to the
 interpreter. This is controversial, because it's not necessarily the right
 default.
 
-For now, it's staing because in cases where a reference is supposed
+For now, it's staying, because in cases where a reference is supposed
 to be multi-use, :c:func:`PyInterpreterRef_Dup` can be used to make up
 for the stolen reference. If it didn't still a reference, there's no
 opposite helper function to throw away the reference, so it's just more

From bca61313e14b0bbe6d999d5037974d00540e8051 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 4 May 2025 09:07:20 -0400
Subject: [PATCH 10/54] Adjust threading section.

---
 peps/pep-0788.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 69632511936..3c7ed8e6003 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -551,8 +551,8 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`.
     call :c:func:`PyThreadState_SetDaemon` before :c:func:`PyThreadState_Ensure`
     at your own discretion.
 
-Changes to :mod:`threading` shutdown
-------------------------------------
+Changes to ``threading`` shutdown and behavior
+----------------------------------------------
 
 An interpreter currently special-cases non-daemon threads created by
 :mod:`threading` and joins them before the interpreter does any other
@@ -563,8 +563,8 @@ will rely on the interpreter's strong reference to run until completion.
 :mod:`threading`-created threads will still be joined to release resources after
 this has happened.
 
-Additionally, setting a :class:`threading.Thread` to :attr:`~threading.Thread.daemon`
-should correspond to calling :c:func:`PyThreadState_SetDaemon` in C. Otherwise,
+Additionally, setting :attr:`threading.Thread.daemon` should
+correspond to calling :c:func:`PyThreadState_SetDaemon` in C. Otherwise,
 :c:func:`PyThreadState_GetDaemon` will have incorrect results in Python
 threads.
 

From 868cdefc96ac282890a62cde2743fef84e8c8a5a Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 4 May 2025 09:09:13 -0400
Subject: [PATCH 11/54] Specify that PyInterpreterRef is pointer-sized

---
 peps/pep-0788.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 3c7ed8e6003..b3e48018641 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -387,6 +387,8 @@ Strong interpreter references
    The interpreter will wait until a strong reference has been released
    before shutting down.
 
+   This type is guaranteed to be pointer-sized.
+
 .. c:function:: PyInterpreterRef PyInterpreterRef_Get(void)
 
     Acquire a strong reference to the current interpreter.

From 6b3a447820adf8f17743e5341bf5fd65bba136c6 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 4 May 2025 09:09:52 -0400
Subject: [PATCH 12/54] Add clarity to reference counting.

---
 peps/pep-0788.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index b3e48018641..e0bb7e4d8b9 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -362,8 +362,8 @@ Specification
 Interpreter reference counting
 ------------------------------
 
-An interpreter will keep track of the number of non-daemon threads through
-a reference count. During finalization, the interpreter will wait until its
+An interpreter will keep track of a reference count managed by threads.
+During finalization, the interpreter will wait until its
 reference count reaches zero, and once that happens, threads can no longer
 acquire a strong reference to the interpreter. Threads can hold as many
 references as they want, but in most cases, a thread will have one reference

From f5e1af804b65697517711acdad8802b79f91b68a Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 4 May 2025 09:13:37 -0400
Subject: [PATCH 13/54] Fix typo in example.

---
 peps/pep-0788.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index e0bb7e4d8b9..dc378a60fbd 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -692,7 +692,7 @@ This is the same code, rewritten to use the new functions:
     static int
     thread_func(void *arg)
     {
-        PyInterpreterRefinterp = (PyInterpreterRef)arg;
+        PyInterpreterRef interp = (PyInterpreterRef)arg;
         if (PyThreadState_Ensure(interp) < 0) {
             return -1;
         }

From 98e7fcc9bd42724d55bf2bf93320bee134351a01 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 4 May 2025 09:34:34 -0400
Subject: [PATCH 14/54] Formalize the headings.

---
 peps/pep-0788.rst | 106 +++++++++++++++++++++++-----------------------
 1 file changed, 54 insertions(+), 52 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index dc378a60fbd..59900e54405 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -1,5 +1,5 @@
 PEP: 788
-Title: Reimagining native threads
+Title: Reimagining Native Threads
 Author: Peter Bierma <zintensitydev@gmail.com>
 Sponsor: Victor Stinner <vstinner@python.org>
 Discussions-To: https://discuss.python.org/t/89863
@@ -60,8 +60,8 @@ be acquired and attached in a thread-safe and predictable manner.
 Motivation
 ==========
 
-Native threads will always hang during finalization
----------------------------------------------------
+Native Threads Always Hang During Finalization
+----------------------------------------------
 
 Many codebases might need to call Python code in highly-asynchronous
 situations where the desired interpreter
@@ -109,8 +109,8 @@ is severely limiting for users who want to do more than just execute Python
 code in their stream of calls (for example, C++ executing finalizers in
 *addition* to calling Python).
 
-Using ``Py_IsFinalizing`` is insufficient
-*****************************************
+``Py_IsFinalizing`` is Insufficient
+***********************************
 
 The :ref:`docs <python:gilstate>`
 currently recommend :c:func:`Py_IsFinalizing` to guard against termination of
@@ -128,8 +128,8 @@ issues; the interpreter might not be finalizing during the call to
 afterwards, which would cause the attachment of a thread state (typically via
 :c:func:`PyGILState_Ensure`) to hang the thread.
 
-Daemon threads can cause finalization deadlocks
-***********************************************
+Daemon Threads Can Deadlock Finalization
+****************************************
 
 When acquiring locks, it's extremely important to detach the thread state to
 prevent deadlocks. This is true on both the with-GIL and free-threaded builds.
@@ -168,8 +168,8 @@ finalization, because a daemon thread got hung while holding the lock.
 
 .. _pep-788-hanging-compat:
 
-We can't change finalization behavior for ``PyGILState_Ensure``
-***************************************************************
+Finalization Behavior for ``PyGILState_Ensure`` Cannot Change
+*************************************************************
 
 There will always have to be a point in a Python program where
 :c:func:`PyGILState_Ensure` can no longer attach a thread state.
@@ -189,15 +189,15 @@ the thread or emit a fatal error, as noted in
 For this reason, we can't make any real changes to how :c:func:`PyGILState_Ensure`
 works during finalization, because it would break existing code.
 
-The existing APIs are broken and misleading
--------------------------------------------
+The GIL-state APIs are Buggy and Confusing
+------------------------------------------
 
 There are currently two public ways for a user to create and attach their own
 :term:`thread state`; manual use of :c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap`,
 and :c:func:`PyGILState_Ensure`. The latter, :c:func:`PyGILState_Ensure`,
 is `significantly more common <https://grep.app/search?q=pygilstate_ensure>`_.
 
-``PyGILState_Ensure`` generally crashes during finalization
+``PyGILState_Ensure`` Generally Crashes During Finalization
 ***********************************************************
 
 At the time of writing, the current behavior of :c:func:`PyGILState_Ensure` does not
@@ -208,7 +208,7 @@ that could be fixed in CPython, but it's definitely worth noting
 here. Incidentally, acceptance and implementation of this PEP will likely fix
 the existing crashes caused by :c:func:`PyGILState_Ensure`.
 
-The term "GIL" is tricky for free-threading
+The Term "GIL" is Tricky for Free-threading
 *******************************************
 
 A large issue with the term "GIL" in the C API is that it is semantically
@@ -246,8 +246,8 @@ roughly equivalent to the following:
 
 .. _pep-788-subinterpreters-gilstate:
 
-Subinterpreters don't work with ``PyGILState_Ensure``
------------------------------------------------------
+``PyGILState_Ensure`` Doesn't Guess the Correct Interpreter
+-----------------------------------------------------------
 
 As noted in the :ref:`documentation <python:gilstate>`,
 ``PyGILState`` APIs aren't officially supported in subinterpreters:
@@ -273,8 +273,8 @@ any synchronization between the two GILs, so both the thread (who thinks it's
 in the subinterpreter) and the main thread could try to increment the
 reference count at the same time, causing a data race!
 
-Interpreters can concurrently shut down
-***************************************
+Concurrent Interpreter Deallocation
+***********************************
 
 The other way of creating a native thread that can invoke Python,
 :c:func:`PyThreadState_New` / :c:func:`PyThreadState_Swap`, is a lot better
@@ -302,8 +302,8 @@ As a summary, there's a few bases we want to cover in a new API:
 - Backwards-compatibility with the old APIs and ideas, such as "daemonness"
   (but as opt-in).
 
-Preventing interpreter finalization with references
----------------------------------------------------
+Preventing Interpreter Finalization with Reference Counting
+-----------------------------------------------------------
 
 This PEP takes an approach where interpreters are given a reference count by
 non-daemon threads that want to (or do) hold an attached thread state. When
@@ -335,8 +335,8 @@ program exits, it's still up to the user on how to join the thread, for
 example by using an :mod:`atexit` handler can be used to join the thread.
 Again, this PEP isn't trying to reinvent how to create or join threads!
 
-Replacing the old APIs
-----------------------
+Removing the GIL-state APIs
+---------------------------
 
 Due to the plethora of issues with ``PyGILState``, this PEP intends to do away
 with them entirely. In today's C API, all ``PyGILState`` functions are
@@ -359,15 +359,17 @@ aren't too clear, see :ref:`pep-788-deprecation`.
 Specification
 =============
 
-Interpreter reference counting
-------------------------------
+Interpreter Reference Counting to Prevent Shutdown
+--------------------------------------------------
 
 An interpreter will keep track of a reference count managed by threads.
 During finalization, the interpreter will wait until its
 reference count reaches zero, and once that happens, threads can no longer
-acquire a strong reference to the interpreter. Threads can hold as many
-references as they want, but in most cases, a thread will have one reference
-at a time, typically through the :term:`attached thread state`.
+acquire a strong reference to the interpreter. The interpreter
+must not hang threads until this reference count has reached zero.
+Threads can hold as many references as they want, but in most cases,
+a thread will have one reference at a time, typically through the
+:term:`attached thread state`.
 
 An attached thread state is made non-daemon by holding a strong reference
 to the interpreter. When a non-daemon thread state is destroyed, it releases
@@ -378,7 +380,7 @@ be safely accessed after the interpreter no longer supports strong references,
 and even after the interpreter has been deleted. But, at that point, the weak
 reference can no longer be converted to a strong reference.
 
-Strong interpreter references
+Strong Interpreter References
 *****************************
 
 .. c:type:: PyInterpreterRef
@@ -417,7 +419,7 @@ Strong interpreter references
     This function cannot fail, and the caller doesn't need to hold an
     :term:`attached thread state`.
 
-Weak interpreter references
+Weak Interpreter References
 ***************************
 
 .. c:type:: PyInterpreterWeakRef
@@ -467,8 +469,8 @@ Weak interpreter references
     This function cannot fail, and the caller doesn't need to hold an
     :term:`attached thread state`.
 
-Daemon and non-daemon threads
------------------------------
+Daemon and Non-daemon Thread States
+-----------------------------------
 
 A non-daemon thread state is a thread state that holds a strong reference to an
 interpreter. The reference is released when the thread state is deleted, either
@@ -504,7 +506,7 @@ See :ref:`pep-788-hanging-compat`.
     This function cannot fail, other than with a fatal error if the caller
     has no :term:`attached thread state`.
 
-Ensuring and releasing thread states
+Ensuring and Releasing Thread States
 ------------------------------------
 
 This proposal includes two new high-level threading APIs that intend to
@@ -553,8 +555,8 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`.
     call :c:func:`PyThreadState_SetDaemon` before :c:func:`PyThreadState_Ensure`
     at your own discretion.
 
-Changes to ``threading`` shutdown and behavior
-----------------------------------------------
+``threading`` Shutdown and Behavior
+-----------------------------------
 
 An interpreter currently special-cases non-daemon threads created by
 :mod:`threading` and joins them before the interpreter does any other
@@ -570,8 +572,8 @@ correspond to calling :c:func:`PyThreadState_SetDaemon` in C. Otherwise,
 :c:func:`PyThreadState_GetDaemon` will have incorrect results in Python
 threads.
 
-Deprecation of ``PyGILState`` APIs
-----------------------------------
+Deprecation of GIL-state APIs
+-----------------------------
 
 This PEP deprecates all of the existing ``PyGILState`` APIs in favor of the
 existing and new ``PyThreadState`` APIs. Namely:
@@ -612,8 +614,8 @@ Examples
 These examples are here to help understand the APIs described in this PEP.
 Ideally, they could be reused in the documentation.
 
-Single-threaded example
-***********************
+Example: A Single-threaded Ensure
+*********************************
 
 This example shows acquiring a lock in a Python method.
 
@@ -648,8 +650,8 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked!
         Py_RETURN_NONE;
     }
 
-Transitioning from the existing functions
-*****************************************
+Example: Transitioning From the Legacy Functions
+************************************************
 
 The following code uses the ``PyGILState`` APIs:
 
@@ -721,8 +723,8 @@ This is the same code, rewritten to use the new functions:
     }
 
 
-Daemon thread example
-*********************
+Example: A Daemon Thread
+************************
 
 Native daemon threads are still a use-case, and as such,
 they can still be used with this API:
@@ -758,8 +760,8 @@ they can still be used with this API:
         Py_RETURN_NONE;
     }
 
-Asynchronous callback example
-*****************************
+Example: An Asynchronous Callback
+*********************************
 
 In some cases, the thread might not ever start, such as in a callback.
 We can't use a strong reference here, because a strong reference would
@@ -807,11 +809,11 @@ at `python/cpython#133110 <https://github.com/python/cpython/pull/133110>`_.
 Rejected Ideas
 ==============
 
-Retrofiting the existing structures with reference counts
+Retrofiting the Existing Structures with Reference Counts
 ---------------------------------------------------------
 
-Using interpreter state pointers for reference counting
-*******************************************************
+Interpreter-State Pointers for Reference Counting
+*************************************************
 
 Originally, this PEP specified :c:func:`!PyInterpreterState_Hold`
 and :c:func:`!PyInterpreterState_Release` for managing strong references
@@ -824,8 +826,8 @@ there was a lack of intuition about when and where something was a strong
 reference. The ``PyInterpreterRef`` and ``PyInterpreterWeakRef`` seem a lot
 clearer.
 
-Using interpreter IDs for reference counting
-********************************************
+Interpreter IDs for Reference Counting
+**************************************
 
 Some iterations of this API took an ``int64_t interp_id`` parameter instead of
 ``PyInterpreterState *interp``, because interpreter IDs cannot be concurrently
@@ -874,7 +876,7 @@ This was ultimately rejected for two reasons:
    for code-generators like Cython to use, as there isn't any additional
    complexity with tracking :c:type:`PyThreadState` pointers around.
 
-Using ``PyStatus`` for the return value of ``PyThreadState_Ensure``
+Using ``PyStatus`` for the Return Value of ``PyThreadState_Ensure``
 -------------------------------------------------------------------
 
 In prior iterations of this API, :c:func:`PyThreadState_Ensure` returned a
@@ -897,8 +899,8 @@ Open Issues
 
 .. _pep-788-deprecation:
 
-When should the legacy APIs be removed?
----------------------------------------
+When Should the GIL-state APIs be Removed?
+------------------------------------------
 
 :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release` have been around
 for over two decades, and it's expected that the migration will be difficult.
@@ -911,7 +913,7 @@ In addition, it's unclear whether to remove them at all. A
 functions if it's determined that a full ``PyGILState`` removal would
 be too disruptive for the ecosystem.
 
-Should ``PyThreadState_Ensure`` steal a reference?
+Should ``PyThreadState_Ensure`` Steal a Reference?
 --------------------------------------------------
 
 At the moment, :c:func:`PyThreadState_Ensure` steals a reference to the

From 95916a72d76931a981ba341a7b315f7c5e40cc05 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 4 May 2025 09:49:28 -0400
Subject: [PATCH 15/54] Add a terminology section.

---
 peps/pep-0788.rst | 50 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 59900e54405..52183fad231 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -57,6 +57,50 @@ and :c:func:`PyThreadState_Ensure` as replacements for the existing functions,
 accompanied by some interpreter reference counting APIs that let thread states
 be acquired and attached in a thread-safe and predictable manner.
 
+Terminology
+===========
+
+Interpreters
+------------
+
+In this proposal, "interpreter" refers to a singular, isolated interpreter
+(see :pep:`684`), with its own :c:type:`PyInterpreterState` pointer (referred
+to as an "interpreter-state"). Interpreter *does not* refer to the entirety
+of a Python process.
+
+The "current interpreter" refers to the interpreter by the interpreter-state
+pointer on an :term:`attached thread state`.
+
+Finalization vs Shutdown
+------------------------
+
+Throughout this PEP, the terms "finalization" and "shutdown" are used in
+reference to what an interpreter does at the end of its lifetime, either
+because the program is closing or because :c:func:`Py_EndInterpreter` was
+called. There's a subtle difference between the two terms, as used in this
+PEP:
+
+- "Finalization" refers to an interpreter getting ready to "shut down", in
+  which it runs garbage collections, cleans up threads, and deletes
+  per-interpreter state. This should not be confused with *runtime*
+  finalization, where process-wide state is also cleaned up, but be aware
+  that the main interpreter is finalized alongside the runtime.
+- "Shutdown" (or "shut down", as a verb) refers to the interpreter being
+  finished, after finalization has already happened. For example, shutdown
+  for a subinterpreter entails the interpreter's state structure being
+  deallocated.
+
+Native and Python Threads
+-------------------------
+
+This PEP refers to a thread created using the C API as a "native thread",
+also sometimes referred to as a "non-Python created thread", where a "Python
+created" is a thread created by the :mod:`threading` module.
+
+Native threads are typically created by :c:func:`PyGILState_Ensure`, but more
+technically, it refers to any thread with a :term:`thread state` created using
+the C API.
+
 Motivation
 ==========
 
@@ -274,7 +318,7 @@ in the subinterpreter) and the main thread could try to increment the
 reference count at the same time, causing a data race!
 
 Concurrent Interpreter Deallocation
-***********************************
+-----------------------------------
 
 The other way of creating a native thread that can invoke Python,
 :c:func:`PyThreadState_New` / :c:func:`PyThreadState_Swap`, is a lot better
@@ -302,8 +346,8 @@ As a summary, there's a few bases we want to cover in a new API:
 - Backwards-compatibility with the old APIs and ideas, such as "daemonness"
   (but as opt-in).
 
-Preventing Interpreter Finalization with Reference Counting
------------------------------------------------------------
+Preventing Interpreter Shutdown with Reference Counting
+-------------------------------------------------------
 
 This PEP takes an approach where interpreters are given a reference count by
 non-daemon threads that want to (or do) hold an attached thread state. When

From 257a25250a07ac80cc2f73b6457e8266a77f2ff0 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 4 May 2025 15:03:14 -0400
Subject: [PATCH 16/54] Add PyInterpreterState_AsStrong()

---
 peps/pep-0788.rst | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 52183fad231..01809938fa1 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -317,11 +317,11 @@ any synchronization between the two GILs, so both the thread (who thinks it's
 in the subinterpreter) and the main thread could try to increment the
 reference count at the same time, causing a data race!
 
-Concurrent Interpreter Deallocation
------------------------------------
+Concurrent Interpreter Deallocation Issues
+------------------------------------------
 
 The other way of creating a native thread that can invoke Python,
-:c:func:`PyThreadState_New` / :c:func:`PyThreadState_Swap`, is a lot better
+:c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap`, is a lot better
 for supporting subinterpreters (because :c:func:`PyThreadState_New` takes an
 explicit interpreter, rather than assuming that the main interpreter was
 requested), but is still limited by the current hanging problems in the C API.
@@ -445,6 +445,17 @@ Strong Interpreter References
     This function cannot fail, other than with a fatal error when the caller
     doesn't hold an :term:`attached thread state`.
 
+.. c:function:: PyInterpreterRef PyInterpreterState_AsStrong(PyInterpreterState *interp)
+
+    Acquire a strong reference to *interp*.
+
+    Beware: this function can cause crashes if *interp* shuts down in
+    another thread! Prefer safely acquiring a reference through
+    :c:func:`PyInterpreterRef_Get` where possible.
+
+    This function will return ``0`` if *interp* has already finished waiting on
+    non-daemon threads.
+
 .. c:function:: PyInterpreterRef PyInterpreterRef_Dup(PyInterpreterRef ref)
 
     Duplicate a strong reference to an interpreter.

From 6b9b74e3141fccaa3db3fcc12ce4789d3af42dfd Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 4 May 2025 15:13:12 -0400
Subject: [PATCH 17/54] Add an example for PyInterpreterState_AsStrong()

---
 peps/pep-0788.rst | 44 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 01809938fa1..c5c32717d60 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -449,12 +449,13 @@ Strong Interpreter References
 
     Acquire a strong reference to *interp*.
 
-    Beware: this function can cause crashes if *interp* shuts down in
-    another thread! Prefer safely acquiring a reference through
-    :c:func:`PyInterpreterRef_Get` where possible.
+    Unless *interp* is the main interpreter, this function can cause crashes
+    if *interp* shuts down in another thread! Prefer safely acquiring a
+    reference through :c:func:`PyInterpreterRef_Get` where possible.
 
     This function will return ``0`` if *interp* has already finished waiting on
-    non-daemon threads.
+    non-daemon threads. The caller does not need to hold an
+    :term:`attached thread state`.
 
 .. c:function:: PyInterpreterRef PyInterpreterRef_Dup(PyInterpreterRef ref)
 
@@ -855,6 +856,41 @@ deadlock the interpreter if it's not released.
         Py_RETURN_NONE;
     }
 
+Example: Calling Python Without a Closure
+*****************************************
+
+There are a few cases where callback functions don't take a closure
+(``void *arg``), so it's impossible to acquire a reference to any specific
+interpreter. The solution to this problem is to acquire a reference to the main
+interpreter through :c:func:`PyInterpreterState_AsStrong`.
+
+But wait, won't that break with subinterpreters, per
+:ref:`pep-788-subinterpreters-gilstate`? Fortunately, since the callback has
+no closure, it's not possible for the caller to pass any objects or
+interpreter-specific data, so it's completely safe to choose the main
+interpreter here.
+
+.. code-block:: c
+
+    static void
+    call_python(void)
+    {
+        PyInterpreterRef *ref = PyInterpreterState_AsStrong(PyInterpreterState_Main());
+        if (ref == 0) {
+            fputs(stderr, "Python has shut down!");
+            return;
+        }
+
+        if (PyThreadState_Ensure(ref) < 0) {
+            return -1;
+        }
+        if (PyRun_SimpleString("print(42)") < 0) {
+            PyErr_Print();
+        }
+        PyThreadState_Release();
+        return 0;
+    }
+
 Reference Implementation
 ========================
 

From 48624efb3c5d9a9d24c836e36edd24e44636ffed Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 4 May 2025 15:48:41 -0400
Subject: [PATCH 18/54] An editorial pass.

---
 peps/pep-0788.rst | 150 ++++++++++++++++++++++++----------------------
 1 file changed, 78 insertions(+), 72 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index c5c32717d60..09cb7d30ed4 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -29,9 +29,9 @@ an interpreter:
 - :c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap` (significantly
   less common).
 
-The former, ``PyGILState``, are the most common way to do this and have been
-the standard for over twenty years (:pep:`311`), but have a number of issues
-that have arisen over time:
+The former, :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`,
+are the most common way to do this and have been the standard for over twenty
+years (:pep:`311`), but have a number of issues that have arisen over time:
 
 - Subinterpreters tend to have trouble with them, because in threads that
   haven't ever had an attached thread state, :c:func:`PyGILState_Ensure`
@@ -55,7 +55,7 @@ Python.
 This PEP intends to solve these issues by providing :c:func:`PyThreadState_Ensure`
 and :c:func:`PyThreadState_Ensure` as replacements for the existing functions,
 accompanied by some interpreter reference counting APIs that let thread states
-be acquired and attached in a thread-safe and predictable manner.
+be acquired and attached in a thread-safe, and predictable manner.
 
 Terminology
 ===========
@@ -65,11 +65,12 @@ Interpreters
 
 In this proposal, "interpreter" refers to a singular, isolated interpreter
 (see :pep:`684`), with its own :c:type:`PyInterpreterState` pointer (referred
-to as an "interpreter-state"). Interpreter *does not* refer to the entirety
+to as an "interpreter-state"). "Interpreter" *does not* refer to the entirety
 of a Python process.
 
-The "current interpreter" refers to the interpreter by the interpreter-state
-pointer on an :term:`attached thread state`.
+The "current interpreter" refers to the interpreter-state
+pointer on an :term:`attached thread state`, as returned by
+:c:func:`PyThreadState_GetInterpreter`.
 
 Finalization vs Shutdown
 ------------------------
@@ -81,14 +82,16 @@ called. There's a subtle difference between the two terms, as used in this
 PEP:
 
 - "Finalization" refers to an interpreter getting ready to "shut down", in
-  which it runs garbage collections, cleans up threads, and deletes
+  which it runs its final garbage collections, cleans up
+  :term:`thread states <thread state>`, and deletes
   per-interpreter state. This should not be confused with *runtime*
   finalization, where process-wide state is also cleaned up, but be aware
   that the main interpreter is finalized alongside the runtime.
-- "Shutdown" (or "shut down", as a verb) refers to the interpreter being
-  finished, after finalization has already happened. For example, shutdown
-  for a subinterpreter entails the interpreter's state structure being
-  deallocated.
+- "Shutdown" (or "shut down", as a verb) refers to the interpreter being in a
+  "finalized" state, after finalization has already happened. Shutdown
+  for a subinterpreter entails its interpreter-state structure being
+  deallocated, and shutdown for the main interpreter includes the entire Python
+  runtime being finalized.
 
 Native and Python Threads
 -------------------------
@@ -98,8 +101,8 @@ also sometimes referred to as a "non-Python created thread", where a "Python
 created" is a thread created by the :mod:`threading` module.
 
 Native threads are typically created by :c:func:`PyGILState_Ensure`, but more
-technically, it refers to any thread with a :term:`thread state` created using
-the C API.
+technically, it refers to any thread with an :term:`attached thread state`
+created and/or attached using the C API.
 
 Motivation
 ==========
@@ -110,7 +113,7 @@ Native Threads Always Hang During Finalization
 Many codebases might need to call Python code in highly-asynchronous
 situations where the desired interpreter
 (:ref:`typically the main interpreter <pep-788-subinterpreters-gilstate>`)
-could be finalizing or deleted, but want to continue running code after the
+could be finalizing or deleted, but want to continue running code after
 invoking the interpreter. This desire has been
 `brought up by users <https://discuss.python.org/t/78850/>`_.
 For example, a callback that wants to call Python code might be invoked when:
@@ -139,19 +142,19 @@ Generally, this pattern would look something like this:
 
 In the current C API, any "native" thread (one not created via the
 :mod:`threading` module) is considered to be "daemon", meaning that the interpreter
-won't wait on that thread to finalize. Instead, the interpreter will hang the
+won't wait on that thread before shutting down. Instead, the interpreter will hang the
 thread when it goes to :term:`attach <attached thread state>` a :term:`thread state`,
-making it unusable past that point. Attaching a thread state can happen at
-any point when invoking Python, such as releasing it in-between bytecode
-instructions (to yield the GIL), or when a C function exits a
+making the thread unusable past that point. Attaching a thread state can happen at
+any point when invoking Python, such as in-between bytecode instructions
+(to yield the :term:`GIL` to a different thread), or when a C function exits a
 :c:macro:`Py_BEGIN_ALLOW_THREADS` block. (Note that hanging the thread is
 relatively new behavior; in prior versions, the thread would terminate, but
 the issue is the same.)
 
-This means that any non-Python thread may be terminated at any point, which
+This means that any non-Python/native thread may be terminated at any point, which
 is severely limiting for users who want to do more than just execute Python
-code in their stream of calls (for example, C++ executing finalizers in
-*addition* to calling Python).
+code in their stream of calls (for example, C++ might want to execute other
+finalizers in addition to calling Python).
 
 ``Py_IsFinalizing`` is Insufficient
 ***********************************
@@ -169,8 +172,8 @@ the thread:
 Unfortunately, this isn't correct, because of time-of-call to time-of-use
 issues; the interpreter might not be finalizing during the call to
 :c:func:`Py_IsFinalizing`, but it might start finalizing immediately
-afterwards, which would cause the attachment of a thread state (typically via
-:c:func:`PyGILState_Ensure`) to hang the thread.
+afterwards, which would cause the attachment of a thread state to hang the
+thread.
 
 Daemon Threads Can Deadlock Finalization
 ****************************************
@@ -185,8 +188,9 @@ lock.
 
 On free-threaded builds, lock-ordering deadlocks are still possible
 if thread A acquired the lock for object A and then object B, and then
-another thread tried to acquire those locks in a reverse order. Free-threading
-protects against this by releasing locks when the thread state is detached.
+another thread tried to acquire those locks in the reverse order. Free-threading
+currently protects against this by releasing locks when the thread state is
+detached, making detachment a necessity to prevent deadlocks.
 
 So, all code that needs to work with locks need to detach the thread state.
 In C, this is almost always done via :c:macro:`Py_BEGIN_ALLOW_THREADS` and
@@ -236,10 +240,10 @@ works during finalization, because it would break existing code.
 The GIL-state APIs are Buggy and Confusing
 ------------------------------------------
 
-There are currently two public ways for a user to create and attach their own
-:term:`thread state`; manual use of :c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap`,
-and :c:func:`PyGILState_Ensure`. The latter, :c:func:`PyGILState_Ensure`,
-is `significantly more common <https://grep.app/search?q=pygilstate_ensure>`_.
+There are currently two public ways for a user to create and attach a
+:term:`thread state` for their thread; manual use of :c:func:`PyThreadState_New`
+and :c:func:`PyThreadState_Swap`, and :c:func:`PyGILState_Ensure`. The latter,
+:c:func:`PyGILState_Ensure`, is `the most common <https://grep.app/search?q=pygilstate_ensure>`_.
 
 ``PyGILState_Ensure`` Generally Crashes During Finalization
 ***********************************************************
@@ -265,9 +269,8 @@ created by the authors of this PEP:
     omit ``PyGILState_Ensure`` in fresh threads.
 
 Again, :c:func:`PyGILState_Ensure` gets an :term:`attached thread state`
-for the thread on both with-GIL and free-threaded builds. Acquisition of the
-GIL on with-GIL builds is incidental! :c:func:`PyGILState_Ensure` is very
-roughly equivalent to the following:
+for the thread on both with-GIL and free-threaded builds. To demonstate,
+:c:func:`PyGILState_Ensure` is very roughly equivalent to the following:
 
 .. code-block:: c
 
@@ -288,13 +291,17 @@ roughly equivalent to the following:
         }
     }
 
+An attached thread state is always needed to call the C API, so
+:c:func:`PyGILState_Ensure` still needs to be called on free-threaded builds,
+but with a name like "ensure GIL", it's not immediately clear that that's true.
+
 .. _pep-788-subinterpreters-gilstate:
 
 ``PyGILState_Ensure`` Doesn't Guess the Correct Interpreter
 -----------------------------------------------------------
 
 As noted in the :ref:`documentation <python:gilstate>`,
-``PyGILState`` APIs aren't officially supported in subinterpreters:
+the ``PyGILState`` functions aren't officially supported in subinterpreters:
 
     Note that the ``PyGILState_*`` functions assume there is only one global
     interpreter (created automatically by ``Py_Initialize()``). Python
@@ -310,65 +317,61 @@ subinterpreters, because synchronization for the wrong interpreter will be
 used on objects shared between the threads.
 
 For example, if the thread had access to object A, which belongs to a
-subinterpreter, but then called :c:func:`PyGILState_Ensure` would have an
-attached thread state pointing to the main interpreter, not the subinterpreter.
-This means that any GIL assumptions about the object are wrong! There isn't
-any synchronization between the two GILs, so both the thread (who thinks it's
-in the subinterpreter) and the main thread could try to increment the
-reference count at the same time, causing a data race!
+subinterpreter, but then called :c:func:`PyGILState_Ensure`, the thread would
+have an :term:`attached thread state` pointing to the main interpreter,
+not the subinterpreter. This means that any :term:`GIL` assumptions about the
+object are wrong! There isn't any synchronization between the two GILs, so both
+the thread (who thinks it's in the subinterpreter) and the main thread could try
+to increment the reference count at the same time, causing a data race!
 
 Concurrent Interpreter Deallocation Issues
 ------------------------------------------
 
 The other way of creating a native thread that can invoke Python,
-:c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap`, is a lot better
+:c:func:`PyThreadState_New` and :c:func:`PyThreadState_Swap`, is a lot better
 for supporting subinterpreters (because :c:func:`PyThreadState_New` takes an
 explicit interpreter, rather than assuming that the main interpreter was
 requested), but is still limited by the current hanging problems in the C API.
 
 In addition, subinterpreters typically have a much shorter lifetime than the
 main interpreter, so there's a much higher chance that an interpreter passed
-to a thread will have already finished and have been deallocated. Passing that
-interpreter to :c:func:`PyThreadState_New` will most likely crash the program.
+to a thread will have already finished and have been deallocated. So, passing
+that interpreter to :c:func:`PyThreadState_New` will most likely crash the program
+because of a use-after-free on the interpreter-state.
 
 Rationale
 =========
 
 So, how do we address all of this? The best way seems to be starting from
-scratch and "reimagining" how to acquire and attach thread states in the C API.
+scratch and "reimagining" how to create, acquire and attach
+:term:`thread states <thread state>` in the C API.
 
 As a summary, there's a few bases we want to cover in a new API:
 
 - Require the caller to specify which interpreter they want to prevent those
   pesky problems with interpreter guessing.
-- Prevent the thread from being arbitrarily bricked by calling into Python.
+- But, we also need to cover cases where a closure isn't available, so the thread
+  won't have access to an interpreter state (but also won't have access to
+  any objects).
+- Prevent the thread from being arbitrarily hung by calling into Python
+  during finalization.
 - Protection against deallocation on interpreters with short lifetimes.
-- Backwards-compatibility with the old APIs and ideas, such as "daemonness"
-  (but as opt-in).
+- Backwards-compatibility with the old APIs and ideas, such as daemonness.
 
 Preventing Interpreter Shutdown with Reference Counting
 -------------------------------------------------------
 
 This PEP takes an approach where interpreters are given a reference count by
-non-daemon threads that want to (or do) hold an attached thread state. When
-the interpreter starts finalizing, it will until its reference count
-reaches zero before proceeding to a point where threads will be hung.
-Note that this *is not* the same as joining the thread; the interpreter will
-only wait until the thread state has been released
-(via :c:func:`PyThreadState_Release`) for all non-daemon threads. This isn't
-the same as waiting for them to detach their thread state--it waits for them
-to *destroy* it. Otherwise, this API wouldn't have any finalization benefits
-over the existing ``PyThreadState`` functions.
+non-daemon threads that want to (or do) hold an attached thread state.
 
 So, from a thread's perspective, holding a "strong reference" to the
-interpreter will effectively prevent it from finalizing, making it safe to
-invoke Python without worrying about the thread being hung. The strong
-reference will be held as long as thread state is "alive", even if it's
-detached.
+interpreter will make it safe to invoke Python without worrying about
+the thread being hung. A strong reference held by a thread state will
+be held as long as thread state is "alive", even if it's detached.
 
 This proposal also comes with weak references to an interpreter that don't
-prevent it from finalizing, but can be promoted to a strong reference once
-decided that a thread state can attach. Promotion of a weak reference to a
+prevent it from shutting down, but can be promoted to a strong reference when
+the user decides that they want to call Python. Promotion of a weak reference to a
 strong reference can fail if the interpreter has already finalized, or reached
 a point during finalization where it can't be guaranteed that the thread won't
 hang.
@@ -406,14 +409,17 @@ Specification
 Interpreter Reference Counting to Prevent Shutdown
 --------------------------------------------------
 
-An interpreter will keep track of a reference count managed by threads.
-During finalization, the interpreter will wait until its
-reference count reaches zero, and once that happens, threads can no longer
-acquire a strong reference to the interpreter. The interpreter
-must not hang threads until this reference count has reached zero.
-Threads can hold as many references as they want, but in most cases,
-a thread will have one reference at a time, typically through the
-:term:`attached thread state`.
+An interpreter will keep a reference count that's managed by threads.
+When the interpreter starts finalizing, it will until its reference count
+reaches zero before proceeding to a point where threads will be hung.
+Note that this *is not* the same as joining the thread; the interpreter will
+only wait until the reference count is zero, typically via releasing non-daemon
+thread states with :c:func:`PyThreadState_Release`.  The interpreter must not hang
+threads until this reference count has reached zero. Threads can hold as many
+references as they want, but in most cases, a thread will have one reference
+at a time, typically through the :term:`attached thread state`. After the reference count
+has reached zero, threads can no longer prevent the interpreter from shutting
+down.
 
 An attached thread state is made non-daemon by holding a strong reference
 to the interpreter. When a non-daemon thread state is destroyed, it releases
@@ -422,7 +428,7 @@ the reference.
 A weak reference to the interpreter won't prevent it from finalizing, but can
 be safely accessed after the interpreter no longer supports strong references,
 and even after the interpreter has been deleted. But, at that point, the weak
-reference can no longer be converted to a strong reference.
+reference can no longer be promoted to a strong reference.
 
 Strong Interpreter References
 *****************************
@@ -531,7 +537,7 @@ Daemon and Non-daemon Thread States
 A non-daemon thread state is a thread state that holds a strong reference to an
 interpreter. The reference is released when the thread state is deleted, either
 by :c:func:`PyThreadState_Release` or a different thread state deletion
-function.
+function (such as :c:func:`PyThreadState_Delete`).
 
 For backwards compatibility, all thread states created by existing APIs,
 including :c:func:`PyGILState_Ensure`, will remain daemon by default.

From 31d3f750fa0af1c6ffa90976e48e378b00421511 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 4 May 2025 16:02:00 -0400
Subject: [PATCH 19/54] Fix typo in example.

---
 peps/pep-0788.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 09cb7d30ed4..f56ce507ed7 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -881,7 +881,7 @@ interpreter here.
     static void
     call_python(void)
     {
-        PyInterpreterRef *ref = PyInterpreterState_AsStrong(PyInterpreterState_Main());
+        PyInterpreterRef ref = PyInterpreterState_AsStrong(PyInterpreterState_Main());
         if (ref == 0) {
             fputs(stderr, "Python has shut down!");
             return;

From 8440057f2a555d80fff99beeb73221ee6bf23cc6 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Mon, 5 May 2025 17:32:49 -0400
Subject: [PATCH 20/54] Some clarifications and a new example.

---
 peps/pep-0788.rst | 129 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 93 insertions(+), 36 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index f56ce507ed7..5212c2a2d4e 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -48,14 +48,13 @@ The other option, :c:func:`PyThreadState_New` and :c:func:`PyThreadState_Swap`,
 do solve those issues, but come with an additional problem with how thread state
 attachment works in the C API (that ``PyGILState`` also includes): if the
 thread is not the main thread, then the interpreter will randomly hang the
-thread during attachment if it starts finalizing. This can be frustrating,
-especially if there was some additional work to be done alongside invoking
-Python.
+thread during attachment if it starts finalizing. This is a problem for large
+applications that want to use their thread in addition to calling Python.
 
 This PEP intends to solve these issues by providing :c:func:`PyThreadState_Ensure`
 and :c:func:`PyThreadState_Ensure` as replacements for the existing functions,
 accompanied by some interpreter reference counting APIs that let thread states
-be acquired and attached in a thread-safe, and predictable manner.
+be acquired and attached in a thread-safe and predictable manner.
 
 Terminology
 ===========
@@ -110,7 +109,7 @@ Motivation
 Native Threads Always Hang During Finalization
 ----------------------------------------------
 
-Many codebases might need to call Python code in highly-asynchronous
+Many large libraries might need to call Python code in highly-asynchronous
 situations where the desired interpreter
 (:ref:`typically the main interpreter <pep-788-subinterpreters-gilstate>`)
 could be finalizing or deleted, but want to continue running code after
@@ -147,14 +146,33 @@ thread when it goes to :term:`attach <attached thread state>` a :term:`thread st
 making the thread unusable past that point. Attaching a thread state can happen at
 any point when invoking Python, such as in-between bytecode instructions
 (to yield the :term:`GIL` to a different thread), or when a C function exits a
-:c:macro:`Py_BEGIN_ALLOW_THREADS` block. (Note that hanging the thread is
-relatively new behavior; in prior versions, the thread would terminate, but
-the issue is the same.)
+:c:macro:`Py_BEGIN_ALLOW_THREADS` block, so simply guarding against whether the
+interpreter is finalizing isn't enough to safely call Python code. (Note that hanging
+the thread is relatively new behavior; in prior versions, the thread would terminate,
+but the issue is the same.)
 
 This means that any non-Python/native thread may be terminated at any point, which
 is severely limiting for users who want to do more than just execute Python
-code in their stream of calls (for example, C++ might want to execute other
-finalizers in addition to calling Python).
+code in their stream of calls.
+
+Joining the Thread isn't Always Possible
+****************************************
+
+In general, it's possible to prevent hanging of threads created while Python
+is active through :mod:`atexit` functions. A thread could be started by some
+C function, and then as long as that thread is joined by :mod:`atexit`, then
+the thread won't hang. Reasonable enough, right?
+
+Unfortunately, :mod:`atexit` isn't always an option, because to call it, you
+need to already have an :term:`attached thread state` for the thread. If
+there's no guarantee of that, then :func:`atexit.register` cannot be safely
+called without the risk of hanging the thread.
+
+For example, large C++ applications might want to expose an interface that can
+call Python code. To do this, a function would take a Python object, and then
+call :c:func:`PyGILState_Ensure` to safely interact with it (e.g., by calling
+it). If the interpreter is finalizing or has shut down, then the thread is
+hung, disrupting the C++ caller.
 
 ``Py_IsFinalizing`` is Insufficient
 ***********************************
@@ -210,7 +228,8 @@ deadlocking. The main thread will continue to run finalizers past that point,
 though. If any of those finalizers try to acquire the lock, deadlock ensues.
 
 This affects CPython itself, and there's not much that can be done
-to fix it. For example, `python/cpython#129536 <https://github.com/python/cpython/issues/129536>`_
+to fix it with the current API. For example,
+`python/cpython#129536 <https://github.com/python/cpython/issues/129536>`_
 remarks that the :mod:`ssl` module can emit a fatal error when used at
 finalization, because a daemon thread got hung while holding the lock.
 
@@ -346,41 +365,37 @@ So, how do we address all of this? The best way seems to be starting from
 scratch and "reimagining" how to create, acquire and attach
 :term:`thread states <thread state>` in the C API.
 
-As a summary, there's a few bases we want to cover in a new API:
-
-- Require the caller to specify which interpreter they want to prevent those
-  pesky problems with interpreter guessing.
-- But, we also need to cover cases where a closure isn't available, so the thread
-  won't have access to an interpreter state (but also won't have access to
-  any objects).
-- Prevent the thread from being arbitrarily hung by calling into Python
-  during finalization.
-- Protection against deallocation on interpreters with short lifetimes.
-- Backwards-compatibility with the old APIs and ideas, such as daemonness.
-
 Preventing Interpreter Shutdown with Reference Counting
 -------------------------------------------------------
 
 This PEP takes an approach where interpreters are given a reference count by
-non-daemon threads that want to (or do) hold an attached thread state.
+non-daemon threads that want to (or do) hold an :term:`attached thread state`.
 
 So, from a thread's perspective, holding a "strong reference" to the
-interpreter will make it safe to invoke Python without worrying about
+interpreter will make it safe to call the C API without worrying about
 the thread being hung. A strong reference held by a thread state will
 be held as long as thread state is "alive", even if it's detached.
 
+This means that interfacing Python (for example, in a C++ library) will need
+a reference to the interpreter in order to safely call the object, which is
+definitely more inconvenient than assuming the main interpreter is the right
+choice, but there's not really another option.
+
+Weak References
+***************
+
 This proposal also comes with weak references to an interpreter that don't
 prevent it from shutting down, but can be promoted to a strong reference when
-the user decides that they want to call Python. Promotion of a weak reference to a
-strong reference can fail if the interpreter has already finalized, or reached
-a point during finalization where it can't be guaranteed that the thread won't
-hang.
+the user decides that they want to call the C API. Promotion of a weak reference
+to a strong reference can fail if the interpreter has already finalized, or
+reached a point during finalization where it can't be guaranteed that the
+thread won't hang.
 
 If there's additional work after destroying the thread state, the thread
 can continue running as normal. If that work needs to finish before the
 program exits, it's still up to the user on how to join the thread, for
-example by using an :mod:`atexit` handler can be used to join the thread.
-Again, this PEP isn't trying to reinvent how to create or join threads!
+example by using an :mod:`atexit` handler can be used to join it.
+This PEP isn't trying to reinvent how to create or join threads!
 
 Removing the GIL-state APIs
 ---------------------------
@@ -406,8 +421,8 @@ aren't too clear, see :ref:`pep-788-deprecation`.
 Specification
 =============
 
-Interpreter Reference Counting to Prevent Shutdown
---------------------------------------------------
+Interpreter References to Prevent Shutdown
+------------------------------------------
 
 An interpreter will keep a reference count that's managed by threads.
 When the interpreter starts finalizing, it will until its reference count
@@ -515,7 +530,7 @@ Weak Interpreter References
     Return a strong reference to an interpreter from a weak reference.
 
     If the interpreter no longer exists or has already finished waiting for
-    non-daemon threads, then this function returns ``NULL``.
+    non-daemon threads, then this function returns ``0``.
 
     The caller does not need to hold an :term:`attached thread state`, but is
     not safe to call in a re-entrant signal handler.
@@ -676,6 +691,48 @@ Examples
 These examples are here to help understand the APIs described in this PEP.
 Ideally, they could be reused in the documentation.
 
+Example: A Library Interface
+****************************
+
+Imagine that you're developing a C library for logging.
+You might want to provide an API that allows users to log to a Python file
+object.
+
+With this PEP, you'd implement it like this:
+
+.. code-block:: c
+
+    int
+    LogToPyFile(PyInterpreterWeakRef *wref,
+                PyObject *file,
+                const char *text)
+    {
+        PyInterpreterRef ref = PyInterpreterWeakRef_AsStrong(wref);
+        if (ref == 0) {
+            fputs("Python interpreter has shut down.", stderr);
+            return -1;
+        }
+
+        if (PyThreadState_Ensure(ref) < 0) {
+            puts("Out of memory.", stderr);
+            return -1;
+        }
+
+        char *to_write = do_some_text_mutation(text);
+        int res = PyFile_WriteString(to_write, file);
+        free(to_write);
+        PyErr_Print();
+        PyThreadState_Release();
+        return res < 0;
+    }
+
+If you were to use :c:func:`PyGILState_Ensure` for this case, then your
+thread would hang if the interpreter were to be finalizing at that time!
+
+Additionally, the API supports subinterpreters. If one were to assume that
+the main interpreter was active, then your library wouldn't be safe to use
+with file objects created by a subinterpreter.
+
 Example: A Single-threaded Ensure
 *********************************
 
@@ -837,7 +894,7 @@ deadlock the interpreter if it's not released.
         PyInterpreterWeakRef *wref = (PyInterpreterWeakRef *)arg;
         PyInterpreterRef *ref = PyInterpreterWeakRef_AsStrong(wref);
         if (ref == NULL) {
-            fputs(stderr, "Python has shut down!");
+            fputs("Python has shut down!", stderr);
             return -1;
         }
 
@@ -883,7 +940,7 @@ interpreter here.
     {
         PyInterpreterRef ref = PyInterpreterState_AsStrong(PyInterpreterState_Main());
         if (ref == 0) {
-            fputs(stderr, "Python has shut down!");
+            fputs("Python has shut down!", stderr);
             return;
         }
 

From 9b08bf0d7e078c5e659fc0ea3b24b489ff8b86a4 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Mon, 5 May 2025 17:34:56 -0400
Subject: [PATCH 21/54] Fix wording.

---
 peps/pep-0788.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 5212c2a2d4e..c4ca56bb89d 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -729,8 +729,8 @@ With this PEP, you'd implement it like this:
 If you were to use :c:func:`PyGILState_Ensure` for this case, then your
 thread would hang if the interpreter were to be finalizing at that time!
 
-Additionally, the API supports subinterpreters. If one were to assume that
-the main interpreter was active, then your library wouldn't be safe to use
+Additionally, the API supports subinterpreters. If you were to assume that
+the main interpreter created the file object, then your library wouldn't be safe to use
 with file objects created by a subinterpreter.
 
 Example: A Single-threaded Ensure

From 0e5acc8e0981c62ec99356f8109b40cd22655ebe Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 9 May 2025 07:21:29 -0400
Subject: [PATCH 22/54] Update peps/pep-0788.rst

Co-authored-by: Victor Stinner <vstinner@python.org>
---
 peps/pep-0788.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 5212c2a2d4e..bb0c96cfcf8 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -472,7 +472,7 @@ Strong Interpreter References
 
     Unless *interp* is the main interpreter, this function can cause crashes
     if *interp* shuts down in another thread! Prefer safely acquiring a
-    reference through :c:func:`PyInterpreterRef_Get` where possible.
+    reference through :c:func:`PyInterpreterRef_Get` whenever possible.
 
     This function will return ``0`` if *interp* has already finished waiting on
     non-daemon threads. The caller does not need to hold an

From 6d9664571c23f2e7fc4602dbbe942bfa2a148e0f Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 9 May 2025 07:25:05 -0400
Subject: [PATCH 23/54] Update peps/pep-0788.rst

Co-authored-by: Victor Stinner <vstinner@python.org>
---
 peps/pep-0788.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index bb0c96cfcf8..c425ca01b1a 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -722,6 +722,7 @@ With this PEP, you'd implement it like this:
         int res = PyFile_WriteString(to_write, file);
         free(to_write);
         PyErr_Print();
+
         PyThreadState_Release();
         return res < 0;
     }

From a229f7b0eed0cbf29db5455b56b9c9c5493c0053 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 9 May 2025 07:25:35 -0400
Subject: [PATCH 24/54] Apply suggestions from code review

Co-authored-by: Victor Stinner <vstinner@python.org>
---
 peps/pep-0788.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index c425ca01b1a..d57cfdbf971 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -709,12 +709,12 @@ With this PEP, you'd implement it like this:
     {
         PyInterpreterRef ref = PyInterpreterWeakRef_AsStrong(wref);
         if (ref == 0) {
-            fputs("Python interpreter has shut down.", stderr);
+            // Python interpreter has shut down
             return -1;
         }
 
         if (PyThreadState_Ensure(ref) < 0) {
-            puts("Out of memory.", stderr);
+            puts("Out of memory.\n", stderr);
             return -1;
         }
 
@@ -895,7 +895,7 @@ deadlock the interpreter if it's not released.
         PyInterpreterWeakRef *wref = (PyInterpreterWeakRef *)arg;
         PyInterpreterRef *ref = PyInterpreterWeakRef_AsStrong(wref);
         if (ref == NULL) {
-            fputs("Python has shut down!", stderr);
+            fputs("Python has shut down!\n", stderr);
             return -1;
         }
 

From 2332d3e8b946f6d298e9ef7c4aca688cbadb19e4 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 9 May 2025 07:27:33 -0400
Subject: [PATCH 25/54] Fix typos.

---
 peps/pep-0788.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index a3da579666e..7810c0b6643 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -52,7 +52,7 @@ thread during attachment if it starts finalizing. This is a problem for large
 applications that want to use their thread in addition to calling Python.
 
 This PEP intends to solve these issues by providing :c:func:`PyThreadState_Ensure`
-and :c:func:`PyThreadState_Ensure` as replacements for the existing functions,
+and :c:func:`PyThreadState_Release` as replacements for the existing functions,
 accompanied by some interpreter reference counting APIs that let thread states
 be acquired and attached in a thread-safe and predictable manner.
 
@@ -893,8 +893,8 @@ deadlock the interpreter if it's not released.
     async_callback(void *arg)
     {
         PyInterpreterWeakRef *wref = (PyInterpreterWeakRef *)arg;
-        PyInterpreterRef *ref = PyInterpreterWeakRef_AsStrong(wref);
-        if (ref == NULL) {
+        PyInterpreterRef ref = PyInterpreterWeakRef_AsStrong(wref);
+        if (ref == 0) {
             fputs("Python has shut down!\n", stderr);
             return -1;
         }

From d5630aff335e6edcafa5f17fed29e74d93274680 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sat, 10 May 2025 10:28:39 -0400
Subject: [PATCH 26/54] Use non-pointers for PyInterpreterRef

---
 peps/pep-0788.rst | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 7810c0b6643..6932726dfd0 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -505,17 +505,20 @@ Weak Interpreter References
     The interpreter will *not* wait for the reference to be
     released before shutting down.
 
-.. c:function:: PyInterpreterWeakRef *PyInterpreterWeakRef_Get(void)
+    This type is guaranteed to be pointer-sized.
+
+.. c:function:: PyInterpreterWeakRef PyInterpreterWeakRef_Get(void)
 
     Acquire a weak reference to the current interpreter.
 
     This function is generally meant to be used in tandem with
     :c:func:`PyInterpreterWeakRef_AsStrong`.
 
-    This function cannot fail, other than with a fatal error when the caller
-    doesn't hold an :term:`attached thread state`.
+    This function returns ``0`` without an exception set on failure.
 
-.. c:function:: PyInterpreterWeakRef *PyInterpreterWeakRef_Dup(PyInterpreterWeakRef wref)
+    The caller must hold an :term:`attached thread state`.
+
+.. c:function:: PyInterpreterWeakRef PyInterpreterWeakRef_Dup(PyInterpreterWeakRef wref)
 
     Duplicate a weak reference to *wref*.
 
@@ -525,7 +528,7 @@ Weak Interpreter References
     This function cannot fail, and the caller doesn't need to hold an
     :term:`attached thread state`.
 
-.. c:function:: PyInterpreterRef PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef *wref)
+.. c:function:: PyInterpreterRef PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef wref)
 
     Return a strong reference to an interpreter from a weak reference.
 
@@ -539,7 +542,7 @@ Weak Interpreter References
     state holds a strong reference to the interpreter, then this function can
     never fail.
 
-.. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef *wref)
+.. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef wref)
 
     Release a weak reference, possibly deallocating it.
 
@@ -703,7 +706,7 @@ With this PEP, you'd implement it like this:
 .. code-block:: c
 
     int
-    LogToPyFile(PyInterpreterWeakRef *wref,
+    LogToPyFile(PyInterpreterWeakRef wref,
                 PyObject *file,
                 const char *text)
     {
@@ -892,7 +895,7 @@ deadlock the interpreter if it's not released.
     static int
     async_callback(void *arg)
     {
-        PyInterpreterWeakRef *wref = (PyInterpreterWeakRef *)arg;
+        PyInterpreterWeakRef wref = (PyInterpreterWeakRef)arg;
         PyInterpreterRef ref = PyInterpreterWeakRef_AsStrong(wref);
         if (ref == 0) {
             fputs("Python has shut down!\n", stderr);
@@ -914,7 +917,7 @@ deadlock the interpreter if it's not released.
     {
         // Weak reference to the interpreter. It won't wait on the callback
         // to finalize.
-        PyInterpreterWeakRef *wref = PyInterpreterWeakRef_Get();
+        PyInterpreterWeakRef wref = PyInterpreterWeakRef_Get();
         register_callback(async_callback, wref);
 
         Py_RETURN_NONE;
@@ -940,7 +943,7 @@ interpreter here.
     call_python(void)
     {
         PyInterpreterRef ref = PyInterpreterState_AsStrong(PyInterpreterState_Main());
-        if (ref == 0) {
+        if (ref == NULL) {
             fputs("Python has shut down!", stderr);
             return;
         }
@@ -978,8 +981,8 @@ converted interpreter IDs (weak references) to strong references.
 In the end, this was rejected, primarily because it was needlessly
 confusing. Interpreter states hadn't ever had a reference count prior, so
 there was a lack of intuition about when and where something was a strong
-reference. The ``PyInterpreterRef`` and ``PyInterpreterWeakRef`` seem a lot
-clearer.
+reference. The :c:type:`PyInterpreterRef` and :c:type:`PyInterpreterWeakRef`
+types seem a lot clearer.
 
 Interpreter IDs for Reference Counting
 **************************************

From 86b4b7985138098139c8d96beaacdb0670345bd7 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Mon, 12 May 2025 05:53:49 -0400
Subject: [PATCH 27/54] Change the API for PyInterpreterState_AsStrong() and
 PyInterpreterWeakRef_AsStrong()

---
 peps/pep-0788.rst | 41 +++++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 6932726dfd0..57926c85da3 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -466,7 +466,7 @@ Strong Interpreter References
     This function cannot fail, other than with a fatal error when the caller
     doesn't hold an :term:`attached thread state`.
 
-.. c:function:: PyInterpreterRef PyInterpreterState_AsStrong(PyInterpreterState *interp)
+.. c:function:: int PyInterpreterState_AsStrong(PyInterpreterState *interp, PyInterpreterRef *ref_ptr)
 
     Acquire a strong reference to *interp*.
 
@@ -474,9 +474,12 @@ Strong Interpreter References
     if *interp* shuts down in another thread! Prefer safely acquiring a
     reference through :c:func:`PyInterpreterRef_Get` whenever possible.
 
-    This function will return ``0`` if *interp* has already finished waiting on
-    non-daemon threads. The caller does not need to hold an
-    :term:`attached thread state`.
+    On success, this function will return ``0`` and set *ref_ptr* to a strong
+    reference, and on failure, this function will return ``-1`` and set *ref_ptr*
+    to ``NULL``. (Failure typically indicates that *interp* has already finished
+    waiting on non-daemon threads).
+
+    The caller does not need to hold an :term:`attached thread state`.
 
 .. c:function:: PyInterpreterRef PyInterpreterRef_Dup(PyInterpreterRef ref)
 
@@ -512,9 +515,7 @@ Weak Interpreter References
     Acquire a weak reference to the current interpreter.
 
     This function is generally meant to be used in tandem with
-    :c:func:`PyInterpreterWeakRef_AsStrong`.
-
-    This function returns ``0`` without an exception set on failure.
+    :c:func:`PyInterpreterWeakRef_AsStrong`, and cannot fail.
 
     The caller must hold an :term:`attached thread state`.
 
@@ -528,20 +529,20 @@ Weak Interpreter References
     This function cannot fail, and the caller doesn't need to hold an
     :term:`attached thread state`.
 
-.. c:function:: PyInterpreterRef PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef wref)
+.. c:function:: int PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef wref, PyInterpreterRef *ref_ptr)
+
+    Acquire a strong reference to an interpreter through a weak reference.
 
-    Return a strong reference to an interpreter from a weak reference.
+    On success, this function returns ``0`` and sets *ref_ptr* to a strong
+    reference to the interpreter denoted by *wref*.
 
     If the interpreter no longer exists or has already finished waiting for
-    non-daemon threads, then this function returns ``0``.
+    non-daemon threads, then this function returns ``-1`` and sets *ref_ptr*
+    to ``NULL``.
 
     The caller does not need to hold an :term:`attached thread state`, but is
     not safe to call in a re-entrant signal handler.
 
-    If the caller *does* hold an :term:`attached thread state`, and that thread
-    state holds a strong reference to the interpreter, then this function can
-    never fail.
-
 .. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef wref)
 
     Release a weak reference, possibly deallocating it.
@@ -710,8 +711,8 @@ With this PEP, you'd implement it like this:
                 PyObject *file,
                 const char *text)
     {
-        PyInterpreterRef ref = PyInterpreterWeakRef_AsStrong(wref);
-        if (ref == 0) {
+        PyInterpreterRef ref;
+        if (PyInterpreterWeakRef_AsStrong(wref, &ref) < 0) {
             // Python interpreter has shut down
             return -1;
         }
@@ -896,8 +897,8 @@ deadlock the interpreter if it's not released.
     async_callback(void *arg)
     {
         PyInterpreterWeakRef wref = (PyInterpreterWeakRef)arg;
-        PyInterpreterRef ref = PyInterpreterWeakRef_AsStrong(wref);
-        if (ref == 0) {
+        PyInterpreterRef ref;
+        if (PyInterpreterWeakRef_AsStrong(wref, &ref) < 0) {
             fputs("Python has shut down!\n", stderr);
             return -1;
         }
@@ -942,8 +943,8 @@ interpreter here.
     static void
     call_python(void)
     {
-        PyInterpreterRef ref = PyInterpreterState_AsStrong(PyInterpreterState_Main());
-        if (ref == NULL) {
+        PyInterpreterRef ref;
+        if (PyInterpreterState_AsStrong(PyInterpreterState_Main(), &ref) < 0) {
             fputs("Python has shut down!", stderr);
             return;
         }

From 3212a611e789d321e681805069e14238ef91fcb0 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Mon, 12 May 2025 09:34:14 -0400
Subject: [PATCH 28/54] Don't specify setting `NULL`

---
 peps/pep-0788.rst | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 57926c85da3..341ad87f35d 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -475,8 +475,8 @@ Strong Interpreter References
     reference through :c:func:`PyInterpreterRef_Get` whenever possible.
 
     On success, this function will return ``0`` and set *ref_ptr* to a strong
-    reference, and on failure, this function will return ``-1`` and set *ref_ptr*
-    to ``NULL``. (Failure typically indicates that *interp* has already finished
+    reference, and on failure, this function will return ``-1``.
+    (Failure typically indicates that *interp* has already finished
     waiting on non-daemon threads).
 
     The caller does not need to hold an :term:`attached thread state`.
@@ -537,11 +537,11 @@ Weak Interpreter References
     reference to the interpreter denoted by *wref*.
 
     If the interpreter no longer exists or has already finished waiting for
-    non-daemon threads, then this function returns ``-1`` and sets *ref_ptr*
-    to ``NULL``.
+    non-daemon threads, then this function returns ``-1``.
 
-    The caller does not need to hold an :term:`attached thread state`, but is
-    not safe to call in a re-entrant signal handler.
+    This function is not safe to call in a re-entrant signal handler.
+
+    The caller does not need to hold an :term:`attached thread state`.
 
 .. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef wref)
 

From 6e3550cf708e1c202b32daa25c5eba17a87cf594 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Mon, 12 May 2025 20:38:46 -0400
Subject: [PATCH 29/54] infinitely -> unbounded

---
 peps/pep-0788.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 341ad87f35d..726d79afc7f 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -18,7 +18,7 @@ In Python, threads are able to interact with an interpreter (e.g., invoke the
 bytecode loop) through an :term:`attached thread state`. On with-GIL builds,
 only one thread can hold an attached thread state at once, which means that
 the thread holds the :term:`GIL`. On free-threaded builds, there can be
-infinitely many thread states attached, allowing for parallelism (because
+an unbounded number of thread states attached, allowing for parallelism (because
 multiple threads can invoke the interpreter at once).
 
 With that in mind, attachment of thread states is a bit problematic in the C API.

From 6f45d71ba7e0c61cfea4b0a2ca4a375a83c2d0e8 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Mon, 12 May 2025 20:44:38 -0400
Subject: [PATCH 30/54] Reword 'extremely common'.

---
 peps/pep-0788.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 726d79afc7f..5e0cf644480 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -268,8 +268,8 @@ and :c:func:`PyThreadState_Swap`, and :c:func:`PyGILState_Ensure`. The latter,
 ***********************************************************
 
 At the time of writing, the current behavior of :c:func:`PyGILState_Ensure` does not
-match the documentation. Instead of hanging the thread during finalization
-as previously noted, it's extremely common for it to crash with a segmentation
+always match the documentation. Instead of hanging the thread during finalization
+as previously noted, it's possible for it to crash with a segmentation
 fault. This is a `known issue <https://github.com/python/cpython/issues/124619>`_
 that could be fixed in CPython, but it's definitely worth noting
 here. Incidentally, acceptance and implementation of this PEP will likely fix

From 1d41eb6d68c0278abde5eb776b9f1c55ef186c17 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Mon, 12 May 2025 20:51:08 -0400
Subject: [PATCH 31/54] Use 'callback parameter' instead of 'closure'.

---
 peps/pep-0788.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 5e0cf644480..2a4a01c9df2 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -924,17 +924,17 @@ deadlock the interpreter if it's not released.
         Py_RETURN_NONE;
     }
 
-Example: Calling Python Without a Closure
-*****************************************
+Example: Calling Python Without a Callback Parameter
+****************************************************
 
-There are a few cases where callback functions don't take a closure
+There are a few cases where callback functions don't take a callback parameter
 (``void *arg``), so it's impossible to acquire a reference to any specific
 interpreter. The solution to this problem is to acquire a reference to the main
 interpreter through :c:func:`PyInterpreterState_AsStrong`.
 
 But wait, won't that break with subinterpreters, per
 :ref:`pep-788-subinterpreters-gilstate`? Fortunately, since the callback has
-no closure, it's not possible for the caller to pass any objects or
+no callback parameter, it's not possible for the caller to pass any objects or
 interpreter-specific data, so it's completely safe to choose the main
 interpreter here.
 

From 2a75bfd91d18d3bc33676b52b7fe54cd260c4510 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 18 May 2025 15:22:40 -0400
Subject: [PATCH 32/54] Don't steal a reference in PyThreadState_Ensure().

---
 peps/pep-0788.rst | 29 ++++++++++++-----------------
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 2a4a01c9df2..057df4e2330 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -718,6 +718,7 @@ With this PEP, you'd implement it like this:
         }
 
         if (PyThreadState_Ensure(ref) < 0) {
+            PyInterpreterRef_Close(ref);
             puts("Out of memory.\n", stderr);
             return -1;
         }
@@ -728,6 +729,7 @@ With this PEP, you'd implement it like this:
         PyErr_Print();
 
         PyThreadState_Release();
+        PyInterpreterRef_Close(ref);
         return res < 0;
     }
 
@@ -758,6 +760,7 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked!
            lock is released. */
         if (PyThreadState_Ensure(ref) < 0) {
             PyErr_NoMemory();
+            PyInterpreterRef_Close(ref);
             return NULL;
         }
 
@@ -771,6 +774,7 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked!
 
         release_some_lock();
         PyThreadState_Release();
+        PyInterpreterRef_Close(ref);
         Py_RETURN_NONE;
     }
 
@@ -820,12 +824,14 @@ This is the same code, rewritten to use the new functions:
     {
         PyInterpreterRef interp = (PyInterpreterRef)arg;
         if (PyThreadState_Ensure(interp) < 0) {
+            PyInterpreterRef_Close(interp);
             return -1;
         }
         if (PyRun_SimpleString("print(42)") < 0) {
             PyErr_Print();
         }
         PyThreadState_Release();
+        PyInterpreterRef_Close(interp);
         return 0;
     }
 
@@ -860,6 +866,7 @@ they can still be used with this API:
     {
         PyInterpreterRef ref = (PyInterpreterRef)arg;
         if (PyThreadState_Ensure(ref) < 0) {
+            PyInterpreterRef_Close(ref);
             return -1;
         }
         (void)PyThreadState_SetDaemon(1);
@@ -867,6 +874,7 @@ they can still be used with this API:
             PyErr_Print();
         }
         PyThreadState_Release();
+        PyInterpreterRef_Close(ref);
         return 0;
     }
 
@@ -904,12 +912,14 @@ deadlock the interpreter if it's not released.
         }
 
         if (PyThreadState_Ensure(ref) < 0) {
+            PyInterpreterRef_Close(ref);
             return -1;
         }
         if (PyRun_SimpleString("print(42)") < 0) {
             PyErr_Print();
         }
         PyThreadState_Release();
+        PyInterpreterRef_Close(ref);
         return 0;
     }
 
@@ -950,12 +960,14 @@ interpreter here.
         }
 
         if (PyThreadState_Ensure(ref) < 0) {
+            PyInterpreterRef_Close(ref);
             return -1;
         }
         if (PyRun_SimpleString("print(42)") < 0) {
             PyErr_Print();
         }
         PyThreadState_Release();
+        PyInterpreterRef_Close(ref);
         return 0;
     }
 
@@ -1072,23 +1084,6 @@ In addition, it's unclear whether to remove them at all. A
 functions if it's determined that a full ``PyGILState`` removal would
 be too disruptive for the ecosystem.
 
-Should ``PyThreadState_Ensure`` Steal a Reference?
---------------------------------------------------
-
-At the moment, :c:func:`PyThreadState_Ensure` steals a reference to the
-interpreter. This is controversial, because it's not necessarily the right
-default.
-
-For now, it's staying, because in cases where a reference is supposed
-to be multi-use, :c:func:`PyInterpreterRef_Dup` can be used to make up
-for the stolen reference. If it didn't still a reference, there's no
-opposite helper function to throw away the reference, so it's just more
-boilerplate. But, this is based on the assumption that there is a general
-desire for single-use interpreter references. If this doesn't prove to be
-the case, and a multi-use reference is overwhelmingly more common, then it
-seems reasonable to let :c:func:`PyThreadState_Ensure` form its own reference
-from the one passed to it.
-
 Copyright
 =========
 

From 1e6285fd81ce2165dcc7d75e32c17e91bf272482 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 18 May 2025 15:24:07 -0400
Subject: [PATCH 33/54] Remove the rest of reference theft.

---
 peps/pep-0788.rst | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 057df4e2330..6422b381edb 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -460,9 +460,6 @@ Strong Interpreter References
 
     Acquire a strong reference to the current interpreter.
 
-    This function is generally meant to be used in tandem with
-    :c:func:`PyThreadState_Ensure`.
-
     This function cannot fail, other than with a fatal error when the caller
     doesn't hold an :term:`attached thread state`.
 
@@ -485,9 +482,6 @@ Strong Interpreter References
 
     Duplicate a strong reference to an interpreter.
 
-    This function is generally meant to be used in tandem with
-    :c:func:`PyThreadState_Ensure`.
-
     This function cannot fail, and the caller doesn't need to hold an
     :term:`attached thread state`.
 
@@ -614,10 +608,6 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`.
     subsequent calls to :c:func:`PyGILState_Ensure` in this thread, but
     :c:func:`PyGILState_Ensure` will *not* make the thread daemon again.
 
-    The reference to the interpreter *ref* is stolen by this function.
-    Use :c:func:`PyInterpreterRef_Dup` if the reference is intended to be
-    kept.
-
     Return zero on success, and non-zero with the old attached thread state
     restored (which may have been ``NULL``).
 
@@ -632,9 +622,7 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`.
 
     This function cannot fail, but may hang the thread if the
     restored :term:`attached thread state` was daemon and the interpreter
-    was finalized. If you're running in a thread where that could be an issue,
-    call :c:func:`PyThreadState_SetDaemon` before :c:func:`PyThreadState_Ensure`
-    at your own discretion.
+    was finalized.
 
 ``threading`` Shutdown and Behavior
 -----------------------------------

From bcc1c731355aef1aadaccb971d78e1b32e41848e Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Sun, 18 May 2025 15:32:02 -0400
Subject: [PATCH 34/54] Remove 'daemon'-ness as a property of threads.

---
 peps/pep-0788.rst | 112 ++++++++--------------------------------------
 1 file changed, 18 insertions(+), 94 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 6422b381edb..34542cc9f09 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -343,8 +343,8 @@ object are wrong! There isn't any synchronization between the two GILs, so both
 the thread (who thinks it's in the subinterpreter) and the main thread could try
 to increment the reference count at the same time, causing a data race!
 
-Concurrent Interpreter Deallocation Issues
-------------------------------------------
+Concurrent Interpreter Deallocation is Frustrating
+--------------------------------------------------
 
 The other way of creating a native thread that can invoke Python,
 :c:func:`PyThreadState_New` and :c:func:`PyThreadState_Swap`, is a lot better
@@ -368,13 +368,11 @@ scratch and "reimagining" how to create, acquire and attach
 Preventing Interpreter Shutdown with Reference Counting
 -------------------------------------------------------
 
-This PEP takes an approach where interpreters are given a reference count by
-non-daemon threads that want to (or do) hold an :term:`attached thread state`.
+This PEP takes an approach where an interpreter is given a reference count
+that prevents it from shutting down.
 
-So, from a thread's perspective, holding a "strong reference" to the
-interpreter will make it safe to call the C API without worrying about
-the thread being hung. A strong reference held by a thread state will
-be held as long as thread state is "alive", even if it's detached.
+So, holding a "strong reference" to the interpreter will make it safe to
+call the C API without worrying about the thread being hung.
 
 This means that interfacing Python (for example, in a C++ library) will need
 a reference to the interpreter in order to safely call the object, which is
@@ -391,12 +389,6 @@ to a strong reference can fail if the interpreter has already finalized, or
 reached a point during finalization where it can't be guaranteed that the
 thread won't hang.
 
-If there's additional work after destroying the thread state, the thread
-can continue running as normal. If that work needs to finish before the
-program exits, it's still up to the user on how to join the thread, for
-example by using an :mod:`atexit` handler can be used to join it.
-This PEP isn't trying to reinvent how to create or join threads!
-
 Removing the GIL-state APIs
 ---------------------------
 
@@ -424,21 +416,14 @@ Specification
 Interpreter References to Prevent Shutdown
 ------------------------------------------
 
-An interpreter will keep a reference count that's managed by threads.
-When the interpreter starts finalizing, it will until its reference count
+An interpreter will keep a reference count that's managed by users of the
+C API. When the interpreter starts finalizing, it will until its reference count
 reaches zero before proceeding to a point where threads will be hung.
 Note that this *is not* the same as joining the thread; the interpreter will
-only wait until the reference count is zero, typically via releasing non-daemon
-thread states with :c:func:`PyThreadState_Release`.  The interpreter must not hang
-threads until this reference count has reached zero. Threads can hold as many
-references as they want, but in most cases, a thread will have one reference
-at a time, typically through the :term:`attached thread state`. After the reference count
-has reached zero, threads can no longer prevent the interpreter from shutting
-down.
-
-An attached thread state is made non-daemon by holding a strong reference
-to the interpreter. When a non-daemon thread state is destroyed, it releases
-the reference.
+only wait until the reference count is zero, and then proceed. The interpreter
+must not hang threads until this reference count has reached zero. 
+After the reference count has reached zero, threads can no longer prevent the
+interpreter from shutting down.
 
 A weak reference to the interpreter won't prevent it from finalizing, but can
 be safely accessed after the interpreter no longer supports strong references,
@@ -474,7 +459,7 @@ Strong Interpreter References
     On success, this function will return ``0`` and set *ref_ptr* to a strong
     reference, and on failure, this function will return ``-1``.
     (Failure typically indicates that *interp* has already finished
-    waiting on non-daemon threads).
+    waiting on its reference count.)
 
     The caller does not need to hold an :term:`attached thread state`.
 
@@ -530,8 +515,8 @@ Weak Interpreter References
     On success, this function returns ``0`` and sets *ref_ptr* to a strong
     reference to the interpreter denoted by *wref*.
 
-    If the interpreter no longer exists or has already finished waiting for
-    non-daemon threads, then this function returns ``-1``.
+    If the interpreter no longer exists or has already finished waiting
+    for its reference count to reach zero, then this function returns ``-1``.
 
     This function is not safe to call in a re-entrant signal handler.
 
@@ -544,43 +529,6 @@ Weak Interpreter References
     This function cannot fail, and the caller doesn't need to hold an
     :term:`attached thread state`.
 
-Daemon and Non-daemon Thread States
------------------------------------
-
-A non-daemon thread state is a thread state that holds a strong reference to an
-interpreter. The reference is released when the thread state is deleted, either
-by :c:func:`PyThreadState_Release` or a different thread state deletion
-function (such as :c:func:`PyThreadState_Delete`).
-
-For backwards compatibility, all thread states created by existing APIs,
-including :c:func:`PyGILState_Ensure`, will remain daemon by default.
-See :ref:`pep-788-hanging-compat`.
-
-.. c:function:: int PyThreadState_SetDaemon(int is_daemon)
-
-    Set the :term:`attached thread state` as non-daemon or daemon.
-
-    The attached thread state must not be the main thread for the
-    interpreter. All thread states created without
-    :c:func:`PyThreadState_Ensure` are daemon by default.
-
-    If the thread state is non-daemon, then the current interpreter will wait
-    for this thread to finish before shutting down by holding a strong
-    reference to the interpreter (see :c:func:`PyInterpreterRef_Get`). See also
-    :attr:`threading.Thread.daemon`.
-
-    Return zero on success, non-zero *without* an exception set on failure.
-    This function can only fail when setting the thread state to non-daemon.
-
-.. c:function:: int PyThreadState_GetDaemon(int is_daemon)
-
-    Returns non-zero if the :term:`attached thread state` is daemon,
-    and zero otherwise. See also and :c:func:`PyThreadState_SetDaemon`
-    and :attr:`threading.Thread.daemon`.
-
-    This function cannot fail, other than with a fatal error if the caller
-    has no :term:`attached thread state`.
-
 Ensuring and Releasing Thread States
 ------------------------------------
 
@@ -604,10 +552,6 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`.
     if the interpreter matches *ref*, it is attached, and otherwise a new
     thread state is created.
 
-    The thread state attached by this function will be reused by
-    subsequent calls to :c:func:`PyGILState_Ensure` in this thread, but
-    :c:func:`PyGILState_Ensure` will *not* make the thread daemon again.
-
     Return zero on success, and non-zero with the old attached thread state
     restored (which may have been ``NULL``).
 
@@ -620,26 +564,7 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`.
     returning. The cached thread state as used by :c:func:`PyThreadState_Ensure`
     and :c:func:`PyGILState_Ensure` will also be restored.
 
-    This function cannot fail, but may hang the thread if the
-    restored :term:`attached thread state` was daemon and the interpreter
-    was finalized.
-
-``threading`` Shutdown and Behavior
------------------------------------
-
-An interpreter currently special-cases non-daemon threads created by
-:mod:`threading` and joins them before the interpreter does any other
-finalization.
-
-:mod:`threading` will be changed to use :c:func:`PyThreadState_Ensure`, and
-will rely on the interpreter's strong reference to run until completion.
-:mod:`threading`-created threads will still be joined to release resources after
-this has happened.
-
-Additionally, setting :attr:`threading.Thread.daemon` should
-correspond to calling :c:func:`PyThreadState_SetDaemon` in C. Otherwise,
-:c:func:`PyThreadState_GetDaemon` will have incorrect results in Python
-threads.
+    This function cannot fail.
 
 Deprecation of GIL-state APIs
 -----------------------------
@@ -744,7 +669,7 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked!
     {
         assert(PyThreadState_GetUnchecked() != NULL);
         PyInterpreterRef ref = PyInterpreterRef_Get();
-        /* Temporarily make this thread non-daemon to ensure that the
+        /* Temporarily hold a strong reference to ensure that the
            lock is released. */
         if (PyThreadState_Ensure(ref) < 0) {
             PyErr_NoMemory();
@@ -857,12 +782,11 @@ they can still be used with this API:
             PyInterpreterRef_Close(ref);
             return -1;
         }
-        (void)PyThreadState_SetDaemon(1);
+        PyInterpreterRef_Close(ref);
         if (PyRun_SimpleString("print(42)") < 0) {
             PyErr_Print();
         }
         PyThreadState_Release();
-        PyInterpreterRef_Close(ref);
         return 0;
     }
 

From 57abedb606c2e5efc135de89ec9e82b59fe889ee Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Mon, 19 May 2025 12:59:44 -0400
Subject: [PATCH 35/54] 'removing' -> 'deprecating'

---
 peps/pep-0788.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 34542cc9f09..5001d665cec 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -389,8 +389,8 @@ to a strong reference can fail if the interpreter has already finalized, or
 reached a point during finalization where it can't be guaranteed that the
 thread won't hang.
 
-Removing the GIL-state APIs
----------------------------
+Deprecation of the GIL-state APIs
+---------------------------------
 
 Due to the plethora of issues with ``PyGILState``, this PEP intends to do away
 with them entirely. In today's C API, all ``PyGILState`` functions are

From e2145b5731755c0c100926e2bd49e09ad6a5e3cf Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Thu, 22 May 2025 17:33:03 -0400
Subject: [PATCH 36/54] Some final updates in response to the reference
 implementation.

---
 peps/pep-0788.rst | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 5001d665cec..16fdae56df6 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -421,7 +421,7 @@ C API. When the interpreter starts finalizing, it will until its reference count
 reaches zero before proceeding to a point where threads will be hung.
 Note that this *is not* the same as joining the thread; the interpreter will
 only wait until the reference count is zero, and then proceed. The interpreter
-must not hang threads until this reference count has reached zero. 
+must not hang threads until this reference count has reached zero.
 After the reference count has reached zero, threads can no longer prevent the
 interpreter from shutting down.
 
@@ -463,6 +463,13 @@ Strong Interpreter References
 
     The caller does not need to hold an :term:`attached thread state`.
 
+.. c:function:: PyInterpreterState *PyInterpreterRef_AsInterpreter(PyInterpreterRef ref)
+
+    Return the interpreter denoted by *ref*.
+
+    This function cannot fail, and the caller doesn't need to hold an
+    :term:`attached thread state`.
+
 .. c:function:: PyInterpreterRef PyInterpreterRef_Dup(PyInterpreterRef ref)
 
     Duplicate a strong reference to an interpreter.
@@ -487,8 +494,6 @@ Weak Interpreter References
     The interpreter will *not* wait for the reference to be
     released before shutting down.
 
-    This type is guaranteed to be pointer-sized.
-
 .. c:function:: PyInterpreterWeakRef PyInterpreterWeakRef_Get(void)
 
     Acquire a weak reference to the current interpreter.
@@ -813,10 +818,15 @@ deadlock the interpreter if it's not released.
 
 .. code-block:: c
 
+    typedef struct {
+        PyInterpreterWeakRef wref;
+    } ThreadData;
+
     static int
     async_callback(void *arg)
     {
-        PyInterpreterWeakRef wref = (PyInterpreterWeakRef)arg;
+        ThreadData *data = (ThreadData *)arg;
+        PyInterpreterWeakRef wref = data->wref;
         PyInterpreterRef ref;
         if (PyInterpreterWeakRef_AsStrong(wref, &ref) < 0) {
             fputs("Python has shut down!\n", stderr);
@@ -840,8 +850,14 @@ deadlock the interpreter if it's not released.
     {
         // Weak reference to the interpreter. It won't wait on the callback
         // to finalize.
+        ThreadData *tdata = PyMem_Malloc(sizeof(ThreadData));
+        if (tdata == NULL) {
+            PyErr_NoMemory();
+            return NULL;
+        }
         PyInterpreterWeakRef wref = PyInterpreterWeakRef_Get();
-        register_callback(async_callback, wref);
+        tdata->wref = wref;
+        register_callback(async_callback, tdata);
 
         Py_RETURN_NONE;
     }

From e547d052dfab82708160953c03aaafaf54bba15d Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Thu, 22 May 2025 17:35:02 -0400
Subject: [PATCH 37/54] Remove some redundant links.

---
 peps/pep-0788.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 16fdae56df6..be88df23152 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -549,11 +549,11 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`.
     :c:func:`PyThreadState_Release` that matches this one.
 
     Nested calls to this function will only sometimes create a new
-    :term:`thread state`. If there is no :term:`attached thread state`,
+    :term:`thread state`. If there is no attached thread state,
     then this function will check for the most recent attached thread
     state used by this thread. If none exists or it doesn't match *ref*,
     a new thread state is created. If it does match *ref*, it is reattached.
-    If there is an :term:`attached thread state`, then a similar check occurs;
+    If there is an attached thread state, then a similar check occurs;
     if the interpreter matches *ref*, it is attached, and otherwise a new
     thread state is created.
 

From dd6e2d1c97e6d11c467e361868ad2571c74400b1 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Thu, 22 May 2025 17:35:29 -0400
Subject: [PATCH 38/54] Remove distinction between finalization and shutdown.

---
 peps/pep-0788.rst | 21 ---------------------
 1 file changed, 21 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index be88df23152..65e205d2a81 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -71,27 +71,6 @@ The "current interpreter" refers to the interpreter-state
 pointer on an :term:`attached thread state`, as returned by
 :c:func:`PyThreadState_GetInterpreter`.
 
-Finalization vs Shutdown
-------------------------
-
-Throughout this PEP, the terms "finalization" and "shutdown" are used in
-reference to what an interpreter does at the end of its lifetime, either
-because the program is closing or because :c:func:`Py_EndInterpreter` was
-called. There's a subtle difference between the two terms, as used in this
-PEP:
-
-- "Finalization" refers to an interpreter getting ready to "shut down", in
-  which it runs its final garbage collections, cleans up
-  :term:`thread states <thread state>`, and deletes
-  per-interpreter state. This should not be confused with *runtime*
-  finalization, where process-wide state is also cleaned up, but be aware
-  that the main interpreter is finalized alongside the runtime.
-- "Shutdown" (or "shut down", as a verb) refers to the interpreter being in a
-  "finalized" state, after finalization has already happened. Shutdown
-  for a subinterpreter entails its interpreter-state structure being
-  deallocated, and shutdown for the main interpreter includes the entire Python
-  runtime being finalized.
-
 Native and Python Threads
 -------------------------
 

From 332394c17aaaa7440545a869cfac9fc8aebe44f6 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Thu, 22 May 2025 17:40:41 -0400
Subject: [PATCH 39/54] Shorten lock + daemon thread section in the motivation.

---
 peps/pep-0788.rst | 30 ++++--------------------------
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 65e205d2a81..bc756cd967d 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -172,8 +172,8 @@ issues; the interpreter might not be finalizing during the call to
 afterwards, which would cause the attachment of a thread state to hang the
 thread.
 
-Daemon Threads Can Deadlock Finalization
-****************************************
+Daemon Threads Can Break Finalization
+*************************************
 
 When acquiring locks, it's extremely important to detach the thread state to
 prevent deadlocks. This is true on both the with-GIL and free-threaded builds.
@@ -181,30 +181,8 @@ prevent deadlocks. This is true on both the with-GIL and free-threaded builds.
 When the GIL is enabled, a deadlock can occur pretty easily when acquiring a
 lock if the GIL wasn't released; thread A grabs a lock, and starts waiting on
 its thread state to attach, while thread B holds the GIL and is waiting on the
-lock.
-
-On free-threaded builds, lock-ordering deadlocks are still possible
-if thread A acquired the lock for object A and then object B, and then
-another thread tried to acquire those locks in the reverse order. Free-threading
-currently protects against this by releasing locks when the thread state is
-detached, making detachment a necessity to prevent deadlocks.
-
-So, all code that needs to work with locks need to detach the thread state.
-In C, this is almost always done via :c:macro:`Py_BEGIN_ALLOW_THREADS` and
-:c:macro:`Py_END_ALLOW_THREADS`, in a code block that looks something like this:
-
-.. code-block:: c
-
-    Py_BEGIN_ALLOW_THREADS
-    acquire_lock();
-    Py_END_ALLOW_THREADS
-
-Again, in a daemon thread, :c:macro:`Py_END_ALLOW_THREADS` will hang the thread
-if the interpreter is finalizing. But, :c:macro:`Py_BEGIN_ALLOW_THREADS` will
-*not* hang the thread; the lock will be acquired, and *then* the thread will
-be hung! Once that happens, nothing can try to acquire that lock without
-deadlocking. The main thread will continue to run finalizers past that point,
-though. If any of those finalizers try to acquire the lock, deadlock ensues.
+lock. A similar deadlock can occur on the free-threaded build during stop-the-world
+pauses when running the garbage collector.
 
 This affects CPython itself, and there's not much that can be done
 to fix it with the current API. For example,

From 6e0982040404b3975d92bcf98f562a5acd44db2e Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Thu, 22 May 2025 19:01:34 -0400
Subject: [PATCH 40/54] Redo the abstract.

---
 peps/pep-0788.rst | 60 +++++++++++++++--------------------------------
 1 file changed, 19 insertions(+), 41 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index bc756cd967d..923188c88b2 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -14,47 +14,25 @@ Post-History: `10-Mar-2025 <https://discuss.python.org/t/83959>`__,
 Abstract
 ========
 
-In Python, threads are able to interact with an interpreter (e.g., invoke the
-bytecode loop) through an :term:`attached thread state`. On with-GIL builds,
-only one thread can hold an attached thread state at once, which means that
-the thread holds the :term:`GIL`. On free-threaded builds, there can be
-an unbounded number of thread states attached, allowing for parallelism (because
-multiple threads can invoke the interpreter at once).
-
-With that in mind, attachment of thread states is a bit problematic in the C API.
-The C API currently provides two ways to acquire and attach a thread state for
-an interpreter:
-
-- :c:func:`PyGILState_Ensure` & :c:func:`PyGILState_Release`.
-- :c:func:`PyThreadState_New` & :c:func:`PyThreadState_Swap` (significantly
-  less common).
-
-The former, :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`,
-are the most common way to do this and have been the standard for over twenty
-years (:pep:`311`), but have a number of issues that have arisen over time:
-
-- Subinterpreters tend to have trouble with them, because in threads that
-  haven't ever had an attached thread state, :c:func:`PyGILState_Ensure`
-  will assume that the main interpreter was requested. This makes it
-  impossible for the thread to interact with the subinterpreter!
-- The phrase "GIL" is confusing for developers of free-threaded
-  extensions, because there's no GIL there, right? Even on free-threaded
-  builds, threads still needs a thread state to interact with the interpreter,
-  it's just that they don't have to wait on one-another to do so. These days,
-  the important thing that :c:func:`PyGILState_Ensure` does is get attach a
-  thread state, and acquiring the GIL is somewhat incidental.
-
-The other option, :c:func:`PyThreadState_New` and :c:func:`PyThreadState_Swap`,
-do solve those issues, but come with an additional problem with how thread state
-attachment works in the C API (that ``PyGILState`` also includes): if the
-thread is not the main thread, then the interpreter will randomly hang the
-thread during attachment if it starts finalizing. This is a problem for large
-applications that want to use their thread in addition to calling Python.
-
-This PEP intends to solve these issues by providing :c:func:`PyThreadState_Ensure`
-and :c:func:`PyThreadState_Release` as replacements for the existing functions,
-accompanied by some interpreter reference counting APIs that let thread states
-be acquired and attached in a thread-safe and predictable manner.
+In the C API, threads are able to interact with an interpreter by holding an
+:term:`attached thread state` for the current thread. This works well, but
+can get complicated when it comes to creating and attaching :term:`thread states`
+in a thread-safe manner.
+
+Specifically, the C API doesn't have any way to ensure that an interpreter
+is in a state where it can be called when creating and/or attaching a thread
+state. As such, attachment might hang the thread, or in subinterpreters, it
+might flat-out crash due to the interpreter's structure being deallocated.
+This can be a frustrating issue to deal with in large applications that
+want to execute Python code alongside some other native code.
+
+In addition, assumptions about which interpreter to use tend to be wrong
+inside of subinterpreters, primarily because :c:func:`PyGILState_Ensure`
+always creates a thread state for the main interpreter in threads where
+Python hasn't ever run.
+
+This PEP intends to solve these kinds issues by *reimagining* how we approach
+thread states in the C API.
 
 Terminology
 ===========

From 12344a9c269bf56e4319ab578e34a4c3c31417a5 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Thu, 22 May 2025 20:00:17 -0400
Subject: [PATCH 41/54] Add the solution to the abstract.

---
 peps/pep-0788.rst | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 923188c88b2..64393d2a287 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -32,7 +32,15 @@ always creates a thread state for the main interpreter in threads where
 Python hasn't ever run.
 
 This PEP intends to solve these kinds issues by *reimagining* how we approach
-thread states in the C API.
+thread states in the C API. This is done through the introduction of interpreter
+references that prevent an interpreter from entering a stage where threads will
+hang, as well as :c:func:`PyThreadState_Ensure` and :c:func:`PyThreadState_Release`
+for replacing :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`.
+
+For example, in APIs that don't require the caller to hold an attached thread
+state, a strong interpreter reference should be passed to the API to ensure
+that it targets the correct interpreter, and that the interpreter doesn't
+concurrently deallocate itself.
 
 Terminology
 ===========

From 45a846ccbef1b2c2a07a483b97cd3452e3dc2162 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 23 May 2025 06:01:10 -0400
Subject: [PATCH 42/54] Fix lint.

---
 peps/pep-0788.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 64393d2a287..09cde5d3a30 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -16,8 +16,8 @@ Abstract
 
 In the C API, threads are able to interact with an interpreter by holding an
 :term:`attached thread state` for the current thread. This works well, but
-can get complicated when it comes to creating and attaching :term:`thread states`
-in a thread-safe manner.
+can get complicated when it comes to creating and Attaching
+:term:`thread states <thread state>` in a thread-safe manner.
 
 Specifically, the C API doesn't have any way to ensure that an interpreter
 is in a state where it can be called when creating and/or attaching a thread

From d2a257afa98aaf2588205f87c174123680626fe4 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 23 May 2025 06:15:11 -0400
Subject: [PATCH 43/54] Add a rejected idea for non-daemon thread states.

---
 peps/pep-0788.rst | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 09cde5d3a30..01ebee4645c 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -851,6 +851,32 @@ at `python/cpython#133110 <https://github.com/python/cpython/pull/133110>`_.
 Rejected Ideas
 ==============
 
+Non-daemon Thread States
+------------------------
+
+In prior iterations of this PEP, interpreter references were a property of
+a thread state rather than a property of an interpreter. This meant that
+:c:func:`PyThreadState_Ensure` stole a strong interpreter reference, and
+it was released upon calling :c:func:`PyThreadState_Release`. A thread state
+that held a reference to an interpreter was known as a "non-daemon thread
+state." At first, this seemed like an improvement, because it shifted management
+of a reference's lifetime to the thread instead of the user, which eliminated
+some boilerplate.
+
+However, this ended up making the proposal significantly more complex and
+hurt the proposal's goals:
+
+- Most importantly, non-daemon thread states put too much emphasis on daemon
+  threads as the problem, which hurt the clarity of the PEP. Additionally, the
+  phrase "non-daemon" added extra confusion, because non-daemon Python threads
+  are explicitly joined, whereas a non-daemon C thread is only waited on
+  until it releases its reference.
+- In many cases, an interpreter reference should outlive a singular thread
+  state. Stealing the interpreter reference in :c:func:`PyThreadState_Ensure`
+  was particularly troublesome for these cases. If :c:func:`PyThreadState_Ensure`
+  didn't steal a reference with non-daemon thread states, it would muddy the
+  ownership story of the interpreter reference, leading to a more confusing API.
+
 Retrofiting the Existing Structures with Reference Counts
 ---------------------------------------------------------
 

From a3cf5f40832cebe122c7fb36d13d19d567a0e5c7 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 23 May 2025 06:50:53 -0400
Subject: [PATCH 44/54] Redo some of the motivation.

---
 peps/pep-0788.rst | 119 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 83 insertions(+), 36 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 01ebee4645c..78f6e77fc50 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -16,13 +16,13 @@ Abstract
 
 In the C API, threads are able to interact with an interpreter by holding an
 :term:`attached thread state` for the current thread. This works well, but
-can get complicated when it comes to creating and Attaching
+can get complicated when it comes to creating and attaching
 :term:`thread states <thread state>` in a thread-safe manner.
 
 Specifically, the C API doesn't have any way to ensure that an interpreter
 is in a state where it can be called when creating and/or attaching a thread
-state. As such, attachment might hang the thread, or in subinterpreters, it
-might flat-out crash due to the interpreter's structure being deallocated.
+state. As such, attachment might hang the thread, or it might flat-out crash
+due to the interpreter's structure being deallocated in subinterpreters.
 This can be a frustrating issue to deal with in large applications that
 want to execute Python code alongside some other native code.
 
@@ -33,14 +33,20 @@ Python hasn't ever run.
 
 This PEP intends to solve these kinds issues by *reimagining* how we approach
 thread states in the C API. This is done through the introduction of interpreter
-references that prevent an interpreter from entering a stage where threads will
-hang, as well as :c:func:`PyThreadState_Ensure` and :c:func:`PyThreadState_Release`
-for replacing :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`.
-
-For example, in APIs that don't require the caller to hold an attached thread
-state, a strong interpreter reference should be passed to the API to ensure
-that it targets the correct interpreter, and that the interpreter doesn't
-concurrently deallocate itself.
+references that prevent an interpreter from finalizing (or more technically,
+entering a stage in which attachment of a thread state hangs).
+This allows for more structure and reliability when it comes to thread state
+management, because it forces a layer of synchronization between the
+interpreter and the caller.
+
+With this new system, there are a lot of changes needed in CPython and
+third-party libraries to adopt it. For example, in APIs that don't require
+the caller to hold an attached thread state, a strong interpreter reference
+should be passed to ensure that it targets the correct interpreter, and that
+the interpreter doesn't concurrently deallocate itself. The best example of
+this in CPython is :c:func:`PyGILState_Ensure`. As part of this proposal,
+:c:func:`PyThreadState_Ensure` is provided as a modern replacement that
+takes a strong interpreter reference.
 
 Terminology
 ===========
@@ -120,25 +126,6 @@ This means that any non-Python/native thread may be terminated at any point, whi
 is severely limiting for users who want to do more than just execute Python
 code in their stream of calls.
 
-Joining the Thread isn't Always Possible
-****************************************
-
-In general, it's possible to prevent hanging of threads created while Python
-is active through :mod:`atexit` functions. A thread could be started by some
-C function, and then as long as that thread is joined by :mod:`atexit`, then
-the thread won't hang. Reasonable enough, right?
-
-Unfortunately, :mod:`atexit` isn't always an option, because to call it, you
-need to already have an :term:`attached thread state` for the thread. If
-there's no guarantee of that, then :func:`atexit.register` cannot be safely
-called without the risk of hanging the thread.
-
-For example, large C++ applications might want to expose an interface that can
-call Python code. To do this, a function would take a Python object, and then
-call :c:func:`PyGILState_Ensure` to safely interact with it (e.g., by calling
-it). If the interpreter is finalizing or has shut down, then the thread is
-hung, disrupting the C++ caller.
-
 ``Py_IsFinalizing`` is Insufficient
 ***********************************
 
@@ -176,6 +163,66 @@ to fix it with the current API. For example,
 remarks that the :mod:`ssl` module can emit a fatal error when used at
 finalization, because a daemon thread got hung while holding the lock.
 
+
+Daemon Threads are not the Problem
+**********************************
+
+Prior to this PEP, deprecating daemon threads was discussed
+`extensively <https://discuss.python.org/t/68836>`_. Daemon threads technically
+cause many of the issues outlined in this proposal, so removing daemon threads
+could be seen as a potential solution. The main argument for removing daemon
+threads is that they're a large cause of problems in the interpreter:
+
+    Except that daemon threads don’t actually work reliably. They’re attempting 
+    to run and use Python interpreter resources after the runtime has been shut
+    down upon runtime finalization. As in they have pointers to global state for
+    the interpreter.
+
+In practice, daemon threads are useful for simplifying many threading applications
+in Python, and since the program is about to close in most cases, it's not worth
+the added complexity to try and gracefully shut down a thread. 
+
+    When I’ve needed daemon threads, it’s usually been the case of “Long-running,
+    uninterruptible, third-party task” in terms of the examples in the linked issue.
+    Basically I’ve had something that I need running in the background, but I have
+    no easy way to terminate it short of process termination. Unfortunately, I’m on
+    Windows, so ``signal.pthread_kill`` isn’t an option. I guess I could use the
+    Windows Terminate Thread API, but it’s a lot of work to wrap it myself compared
+    to just letting process termination handle things.
+
+Finally, removing Python-level daemon threads does not fix the whole problem.
+As noted by this PEP, extension modules are free to create their own threads
+and attach thread states for them. Similar to daemon threads, Python doesn't
+try and join them during finalization, so trying to remove daemon threads
+as a whole would involve trying to remove them from the C API, which would
+require a massive API change.
+
+    Realize however that even if we get rid of daemon threads, extension
+    module code can and does spawn its own threads that are not tracked by
+    Python. ... Those are realistically an alternate form of daemon thread
+    ... and those are never going to be forbidden.
+
+Joining the Thread isn't Always a Good Idea
+*******************************************
+
+Even in daemon threads, it's generally *possible* to prevent hanging of
+native threads through :mod:`atexit` functions. 
+A thread could be started by some C function, and then as long as
+that thread is joined by :mod:`atexit`, then the thread won't hang.
+
+:mod:`atexit` isn't always an option for a function, because to call it, it
+needs to already have an :term:`attached thread state` for the thread. If
+there's no guarantee of that, then :func:`atexit.register` cannot be safely
+called without the risk of hanging the thread. This shifts the contract
+of joining the thread to the caller rather than the callee, which again, 
+isn't done in practice.
+
+For example, large C++ applications might want to expose an interface that can
+call Python code. To do this, a C++ API would take a Python object, and then
+call :c:func:`PyGILState_Ensure` to safely interact with it (for example, by
+calling it). If the interpreter is finalizing or has shut down, then the thread
+is hung, disrupting the C++ stream of calls.
+
 .. _pep-788-hanging-compat:
 
 Finalization Behavior for ``PyGILState_Ensure`` Cannot Change
@@ -312,10 +359,9 @@ Preventing Interpreter Shutdown with Reference Counting
 -------------------------------------------------------
 
 This PEP takes an approach where an interpreter is given a reference count
-that prevents it from shutting down.
-
-So, holding a "strong reference" to the interpreter will make it safe to
-call the C API without worrying about the thread being hung.
+that prevents it from shutting down. So, holding a "strong reference" to the 
+interpreter will make it safe to call the C API without worrying about the
+thread being hung.
 
 This means that interfacing Python (for example, in a C++ library) will need
 a reference to the interpreter in order to safely call the object, which is
@@ -361,8 +407,9 @@ Interpreter References to Prevent Shutdown
 
 An interpreter will keep a reference count that's managed by users of the
 C API. When the interpreter starts finalizing, it will until its reference count
-reaches zero before proceeding to a point where threads will be hung.
-Note that this *is not* the same as joining the thread; the interpreter will
+reaches zero before proceeding to a point where threads will be hung. This will
+happen around the same time when :class:`threading.Thread` objects are joined,
+but ote that this *is not* the same as joining the thread; the interpreter will
 only wait until the reference count is zero, and then proceed. The interpreter
 must not hang threads until this reference count has reached zero.
 After the reference count has reached zero, threads can no longer prevent the

From 81dd8d37ab254050109df11c091f436f7cefef1c Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 23 May 2025 06:52:58 -0400
Subject: [PATCH 45/54] Fix lint.

---
 peps/pep-0788.rst | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 78f6e77fc50..ef0ecc1fe55 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -173,14 +173,14 @@ cause many of the issues outlined in this proposal, so removing daemon threads
 could be seen as a potential solution. The main argument for removing daemon
 threads is that they're a large cause of problems in the interpreter:
 
-    Except that daemon threads don’t actually work reliably. They’re attempting 
+    Except that daemon threads don’t actually work reliably. They’re attempting
     to run and use Python interpreter resources after the runtime has been shut
     down upon runtime finalization. As in they have pointers to global state for
     the interpreter.
 
 In practice, daemon threads are useful for simplifying many threading applications
 in Python, and since the program is about to close in most cases, it's not worth
-the added complexity to try and gracefully shut down a thread. 
+the added complexity to try and gracefully shut down a thread.
 
     When I’ve needed daemon threads, it’s usually been the case of “Long-running,
     uninterruptible, third-party task” in terms of the examples in the linked issue.
@@ -206,7 +206,7 @@ Joining the Thread isn't Always a Good Idea
 *******************************************
 
 Even in daemon threads, it's generally *possible* to prevent hanging of
-native threads through :mod:`atexit` functions. 
+native threads through :mod:`atexit` functions.
 A thread could be started by some C function, and then as long as
 that thread is joined by :mod:`atexit`, then the thread won't hang.
 
@@ -214,7 +214,7 @@ that thread is joined by :mod:`atexit`, then the thread won't hang.
 needs to already have an :term:`attached thread state` for the thread. If
 there's no guarantee of that, then :func:`atexit.register` cannot be safely
 called without the risk of hanging the thread. This shifts the contract
-of joining the thread to the caller rather than the callee, which again, 
+of joining the thread to the caller rather than the callee, which again,
 isn't done in practice.
 
 For example, large C++ applications might want to expose an interface that can
@@ -359,7 +359,7 @@ Preventing Interpreter Shutdown with Reference Counting
 -------------------------------------------------------
 
 This PEP takes an approach where an interpreter is given a reference count
-that prevents it from shutting down. So, holding a "strong reference" to the 
+that prevents it from shutting down. So, holding a "strong reference" to the
 interpreter will make it safe to call the C API without worrying about the
 thread being hung.
 

From 2aad8fe6a719f15aa14e3ac809252039b22cf10c Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 23 May 2025 08:40:18 -0400
Subject: [PATCH 46/54] Update peps/pep-0788.rst

Co-authored-by: Victor Stinner <vstinner@python.org>
---
 peps/pep-0788.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index ef0ecc1fe55..4bb396495ec 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -119,7 +119,7 @@ any point when invoking Python, such as in-between bytecode instructions
 (to yield the :term:`GIL` to a different thread), or when a C function exits a
 :c:macro:`Py_BEGIN_ALLOW_THREADS` block, so simply guarding against whether the
 interpreter is finalizing isn't enough to safely call Python code. (Note that hanging
-the thread is relatively new behavior; in prior versions, the thread would terminate,
+the thread is relatively new behavior; in prior versions, the thread would exit,
 but the issue is the same.)
 
 This means that any non-Python/native thread may be terminated at any point, which

From 232208cea41e900198653e86254f22de8cf7547c Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 23 May 2025 21:02:02 -0400
Subject: [PATCH 47/54] Fix typo.

---
 peps/pep-0788.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 4bb396495ec..f064bfc98ee 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -409,7 +409,7 @@ An interpreter will keep a reference count that's managed by users of the
 C API. When the interpreter starts finalizing, it will until its reference count
 reaches zero before proceeding to a point where threads will be hung. This will
 happen around the same time when :class:`threading.Thread` objects are joined,
-but ote that this *is not* the same as joining the thread; the interpreter will
+but note that this *is not* the same as joining the thread; the interpreter will
 only wait until the reference count is zero, and then proceed. The interpreter
 must not hang threads until this reference count has reached zero.
 After the reference count has reached zero, threads can no longer prevent the

From 558ed81193dbc658507877e7b6950ce1a5e1b93c Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 23 May 2025 21:04:31 -0400
Subject: [PATCH 48/54] Fix misleading sentence.

---
 peps/pep-0788.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index f064bfc98ee..b613638cc48 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -70,9 +70,9 @@ This PEP refers to a thread created using the C API as a "native thread",
 also sometimes referred to as a "non-Python created thread", where a "Python
 created" is a thread created by the :mod:`threading` module.
 
-Native threads are typically created by :c:func:`PyGILState_Ensure`, but more
-technically, it refers to any thread with an :term:`attached thread state`
-created and/or attached using the C API.
+A native thread is typically registered with the interpreter by
+:c:func:`PyGILState_Ensure`, but any thread with an :term:`attached thread state`
+qualifies as a native thread.
 
 Motivation
 ==========

From b6e9e02ade03ffb49cc272d6dffe1eb729021312 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 23 May 2025 21:05:05 -0400
Subject: [PATCH 49/54] Simplify phrasing.

---
 peps/pep-0788.rst | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index b613638cc48..3f318246057 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -547,8 +547,7 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`.
     if the interpreter matches *ref*, it is attached, and otherwise a new
     thread state is created.
 
-    Return zero on success, and non-zero with the old attached thread state
-    restored (which may have been ``NULL``).
+    Return zero on success, and non-zero on failure.
 
 .. c:function:: void PyThreadState_Release()
 

From b0898a548acdfebf8cc9ad956dab0df71188577e Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Fri, 23 May 2025 21:06:01 -0400
Subject: [PATCH 50/54] Add a comment.

---
 peps/pep-0788.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 3f318246057..0b7347af066 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -776,6 +776,8 @@ they can still be used with this API:
             PyInterpreterRef_Close(ref);
             return -1;
         }
+        /* Release the interpreter reference, allowing it to
+           finalize. This means that print(42) can hang this thread. */
         PyInterpreterRef_Close(ref);
         if (PyRun_SimpleString("print(42)") < 0) {
             PyErr_Print();

From 48b408b5052f7b12b53c43b94c90d1efdc652892 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Tue, 27 May 2025 20:29:51 -0400
Subject: [PATCH 51/54] Some tidying up.

---
 peps/pep-0788.rst | 73 +++++++++++++++++++++++++++++++----------------
 1 file changed, 48 insertions(+), 25 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 0b7347af066..6b62d631285 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -431,25 +431,32 @@ Strong Interpreter References
 
    This type is guaranteed to be pointer-sized.
 
-.. c:function:: PyInterpreterRef PyInterpreterRef_Get(void)
+.. c:function:: int PyInterpreterRef_Get(PyInterpreterRef *ref_ptr)
 
     Acquire a strong reference to the current interpreter.
 
-    This function cannot fail, other than with a fatal error when the caller
-    doesn't hold an :term:`attached thread state`.
+    On success, this function returns ``0`` and sets *ref_ptr*
+    to a strong reference to the interpreter, and returns ``-1``
+    with an exception set on failure.
 
-.. c:function:: int PyInterpreterState_AsStrong(PyInterpreterState *interp, PyInterpreterRef *ref_ptr)
+    Failure typically indicates that the interpreter has
+    already finished waiting on strong references.
 
-    Acquire a strong reference to *interp*.
+    The caller must hold an :term:`attached thread state`.
+
+.. c:function:: int PyInterpreterRef_Main(PyInterpreterRef *ref_ptr)
+
+    Acquire a strong reference to the main interpreter.
 
-    Unless *interp* is the main interpreter, this function can cause crashes
-    if *interp* shuts down in another thread! Prefer safely acquiring a
-    reference through :c:func:`PyInterpreterRef_Get` whenever possible.
+    This function only exists for special cases where a specific interpreter
+    can't be saved. Prefer safely acquiring a reference through
+    :c:func:`PyInterpreterRef_Get` whenever possible.
 
     On success, this function will return ``0`` and set *ref_ptr* to a strong
     reference, and on failure, this function will return ``-1``.
-    (Failure typically indicates that *interp* has already finished
-    waiting on its reference count.)
+
+    Failure typically indicates that the main interpreter has already finished
+    waiting on its reference count.
 
     The caller does not need to hold an :term:`attached thread state`.
 
@@ -484,21 +491,22 @@ Weak Interpreter References
     The interpreter will *not* wait for the reference to be
     released before shutting down.
 
-.. c:function:: PyInterpreterWeakRef PyInterpreterWeakRef_Get(void)
+.. c:function:: int PyInterpreterWeakRef_Get(PyInterpreterWeakRef *wref_ptr)
 
     Acquire a weak reference to the current interpreter.
 
     This function is generally meant to be used in tandem with
-    :c:func:`PyInterpreterWeakRef_AsStrong`, and cannot fail.
+    :c:func:`PyInterpreterWeakRef_AsStrong`.
+
+    On success, this function returns ``0`` and sets *wref_ptr* to a
+    weak reference to the interpreter, and returns ``-1`` with an exception
+    set on failure.
 
     The caller must hold an :term:`attached thread state`.
 
 .. c:function:: PyInterpreterWeakRef PyInterpreterWeakRef_Dup(PyInterpreterWeakRef wref)
 
-    Duplicate a weak reference to *wref*.
-
-    This function is generally meant to be used in tandem with
-    :c:func:`PyInterpreterWeakRef_AsStrong`.
+    Duplicate a weak reference to an interpreter.
 
     This function cannot fail, and the caller doesn't need to hold an
     :term:`attached thread state`.
@@ -519,7 +527,7 @@ Weak Interpreter References
 
 .. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef wref)
 
-    Release a weak reference, possibly deallocating it.
+    Release a weak reference to an interpreter.
 
     This function cannot fail, and the caller doesn't need to hold an
     :term:`attached thread state`.
@@ -547,7 +555,7 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`.
     if the interpreter matches *ref*, it is attached, and otherwise a new
     thread state is created.
 
-    Return zero on success, and non-zero on failure.
+    Return ``0`` on success, and ``-1`` on failure.
 
 .. c:function:: void PyThreadState_Release()
 
@@ -620,7 +628,7 @@ With this PEP, you'd implement it like this:
     {
         PyInterpreterRef ref;
         if (PyInterpreterWeakRef_AsStrong(wref, &ref) < 0) {
-            // Python interpreter has shut down
+            /* Python interpreter has shut down */
             return -1;
         }
 
@@ -662,7 +670,11 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked!
     my_critical_operation(PyObject *self, PyObject *unused)
     {
         assert(PyThreadState_GetUnchecked() != NULL);
-        PyInterpreterRef ref = PyInterpreterRef_Get();
+        PyInterpreterRef ref;
+        if (PyInterpreterRef_Get(&ref) < 0) {
+            /* Python interpreter has shut down */
+            return NULL;
+        }
         /* Temporarily hold a strong reference to ensure that the
            lock is released. */
         if (PyThreadState_Ensure(ref) < 0) {
@@ -748,7 +760,11 @@ This is the same code, rewritten to use the new functions:
         PyThread_handle_t handle;
         PyThead_indent_t indent;
 
-        PyInterpreterRef ref = PyInterpreterRef_Get();
+        PyInterpreterRef ref;
+        if (PyInterpreterRef_Get(&ref) < 0) {
+            return NULL;
+        }
+
         if (PyThread_start_joinable_thread(thread_func, (void *)ref, &ident, &handle) < 0) {
             PyInterpreterRef_Close(ref);
             return NULL;
@@ -792,7 +808,11 @@ they can still be used with this API:
         PyThread_handle_t handle;
         PyThead_indent_t indent;
 
-        PyInterpreterRef ref = PyInterpreterRef_Get();
+        PyInterpreterRef ref;
+        if (PyInterpreterRef_Get(&ref) < 0) {
+            return NULL;
+        }
+
         if (PyThread_start_joinable_thread(thread_func, (void *)ref, &ident, &handle) < 0) {
             PyInterpreterRef_Close(ref);
             return NULL;
@@ -846,7 +866,10 @@ deadlock the interpreter if it's not released.
             PyErr_NoMemory();
             return NULL;
         }
-        PyInterpreterWeakRef wref = PyInterpreterWeakRef_Get();
+        PyInterpreterWeakRef wref;
+        if (PyInterpreterWeakRef_Get(&wref) < 0) {
+            return NULL;
+        }
         tdata->wref = wref;
         register_callback(async_callback, tdata);
 
@@ -859,7 +882,7 @@ Example: Calling Python Without a Callback Parameter
 There are a few cases where callback functions don't take a callback parameter
 (``void *arg``), so it's impossible to acquire a reference to any specific
 interpreter. The solution to this problem is to acquire a reference to the main
-interpreter through :c:func:`PyInterpreterState_AsStrong`.
+interpreter through :c:func:`PyInterpreterRef_Main`.
 
 But wait, won't that break with subinterpreters, per
 :ref:`pep-788-subinterpreters-gilstate`? Fortunately, since the callback has
@@ -873,7 +896,7 @@ interpreter here.
     call_python(void)
     {
         PyInterpreterRef ref;
-        if (PyInterpreterState_AsStrong(PyInterpreterState_Main(), &ref) < 0) {
+        if (PyInterpreterRef_Main(&ref) < 0) {
             fputs("Python has shut down!", stderr);
             return;
         }

From 0c8042e1686a314669356313f6bbb6d0ebd9d10e Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Tue, 27 May 2025 20:32:11 -0400
Subject: [PATCH 52/54] Change up a title.

---
 peps/pep-0788.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 6b62d631285..09070450b09 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -333,8 +333,8 @@ object are wrong! There isn't any synchronization between the two GILs, so both
 the thread (who thinks it's in the subinterpreter) and the main thread could try
 to increment the reference count at the same time, causing a data race!
 
-Concurrent Interpreter Deallocation is Frustrating
---------------------------------------------------
+An Interpreter Can Concurrently Deallocate
+------------------------------------------
 
 The other way of creating a native thread that can invoke Python,
 :c:func:`PyThreadState_New` and :c:func:`PyThreadState_Swap`, is a lot better

From ec1c5cce0758a103da9390711a2fcbfd80ea9f32 Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Wed, 28 May 2025 12:22:04 +0000
Subject: [PATCH 53/54] Avoid the _ptr suffix.

---
 peps/pep-0788.rst | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 09070450b09..3658c3a3549 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -431,11 +431,11 @@ Strong Interpreter References
 
    This type is guaranteed to be pointer-sized.
 
-.. c:function:: int PyInterpreterRef_Get(PyInterpreterRef *ref_ptr)
+.. c:function:: int PyInterpreterRef_Get(PyInterpreterRef *ref)
 
     Acquire a strong reference to the current interpreter.
 
-    On success, this function returns ``0`` and sets *ref_ptr*
+    On success, this function returns ``0`` and sets *ref*
     to a strong reference to the interpreter, and returns ``-1``
     with an exception set on failure.
 
@@ -444,7 +444,7 @@ Strong Interpreter References
 
     The caller must hold an :term:`attached thread state`.
 
-.. c:function:: int PyInterpreterRef_Main(PyInterpreterRef *ref_ptr)
+.. c:function:: int PyInterpreterRef_Main(PyInterpreterRef *ref)
 
     Acquire a strong reference to the main interpreter.
 
@@ -452,7 +452,7 @@ Strong Interpreter References
     can't be saved. Prefer safely acquiring a reference through
     :c:func:`PyInterpreterRef_Get` whenever possible.
 
-    On success, this function will return ``0`` and set *ref_ptr* to a strong
+    On success, this function will return ``0`` and set *ref* to a strong
     reference, and on failure, this function will return ``-1``.
 
     Failure typically indicates that the main interpreter has already finished
@@ -491,14 +491,14 @@ Weak Interpreter References
     The interpreter will *not* wait for the reference to be
     released before shutting down.
 
-.. c:function:: int PyInterpreterWeakRef_Get(PyInterpreterWeakRef *wref_ptr)
+.. c:function:: int PyInterpreterWeakRef_Get(PyInterpreterWeakRef *wref)
 
     Acquire a weak reference to the current interpreter.
 
     This function is generally meant to be used in tandem with
     :c:func:`PyInterpreterWeakRef_AsStrong`.
 
-    On success, this function returns ``0`` and sets *wref_ptr* to a
+    On success, this function returns ``0`` and sets *wref* to a
     weak reference to the interpreter, and returns ``-1`` with an exception
     set on failure.
 
@@ -511,11 +511,11 @@ Weak Interpreter References
     This function cannot fail, and the caller doesn't need to hold an
     :term:`attached thread state`.
 
-.. c:function:: int PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef wref, PyInterpreterRef *ref_ptr)
+.. c:function:: int PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef wref, PyInterpreterRef *ref)
 
     Acquire a strong reference to an interpreter through a weak reference.
 
-    On success, this function returns ``0`` and sets *ref_ptr* to a strong
+    On success, this function returns ``0`` and sets *ref* to a strong
     reference to the interpreter denoted by *wref*.
 
     If the interpreter no longer exists or has already finished waiting

From 977188ca61a43534e95ed79c636e5fb3c9fd878c Mon Sep 17 00:00:00 2001
From: Peter Bierma <zintensitydev@gmail.com>
Date: Wed, 28 May 2025 12:22:48 +0000
Subject: [PATCH 54/54] Fix memory leak.

---
 peps/pep-0788.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst
index 3658c3a3549..3c54811f514 100644
--- a/peps/pep-0788.rst
+++ b/peps/pep-0788.rst
@@ -861,13 +861,14 @@ deadlock the interpreter if it's not released.
     {
         // Weak reference to the interpreter. It won't wait on the callback
         // to finalize.
-        ThreadData *tdata = PyMem_Malloc(sizeof(ThreadData));
+        ThreadData *tdata = PyMem_RawMalloc(sizeof(ThreadData));
         if (tdata == NULL) {
             PyErr_NoMemory();
             return NULL;
         }
         PyInterpreterWeakRef wref;
         if (PyInterpreterWeakRef_Get(&wref) < 0) {
+            PyMem_RawFree(tdata);
             return NULL;
         }
         tdata->wref = wref;