diff --git a/docs/libcudacxx/extended_api.rst b/docs/libcudacxx/extended_api.rst
index 1df995a432f..5584fd77dd1 100644
--- a/docs/libcudacxx/extended_api.rst
+++ b/docs/libcudacxx/extended_api.rst
@@ -6,7 +6,6 @@ Extended API
 .. toctree::
    :maxdepth: 2
 
-   extended_api/execution_model
    extended_api/memory_model
    extended_api/thread_groups
    extended_api/shapes
diff --git a/docs/libcudacxx/extended_api/execution_model.rst b/docs/libcudacxx/extended_api/execution_model.rst
deleted file mode 100644
index 6def2b2c6e1..00000000000
--- a/docs/libcudacxx/extended_api/execution_model.rst
+++ /dev/null
@@ -1,272 +0,0 @@
-.. _libcudacxx-extended-api-execution-model:
-
-Execution model
-===============
-
-CUDA C++ aims to provide `_parallel forward progress_ [intro.progress.9] <https://eel.is/c++draft/intro.progress#9>`__
-for all device threads of execution, facilitating the parallelization of pre-existing C++ applications with CUDA C++.
-
-.. dropdown:: [intro.progress]
-
-    - `[intro.progress.7] <https://eel.is/c++draft/intro.progress#7>`__: For a thread of execution
-      providing `concurrent forward progress guarantees <https://eel.is/c++draft/intro.progress#def:concurrent_forward_progress_guarantees>`__,
-      the implementation ensures that the thread will eventually make progress for as long as it has not terminated.
-
-      [Note 5: This applies regardless of whether or not other threads of execution (if any) have been or are making progress.
-      To eventually fulfill this requirement means that this will happen in an unspecified but finite amount of time. — end note]
-
-    - `[intro.progress.9] <https://eel.is/c++draft/intro.progress>`__: For a thread of execution providing
-      `parallel forward progress guarantees <https://eel.is/c++draft/intro.progress#9>`__, the implementation is not required to ensure that
-      the thread will eventually make progress if it has not yet executed any execution step; once this thread has executed a step,
-      it provides concurrent forward progress guarantees.
-
-      [Note 6: This does not specify a requirement for when to start this thread of execution, which will typically be specified by the entity
-      that creates this thread of execution. For example, a thread of execution that provides concurrent forward progress guarantees and executes
-      tasks from a set of tasks in an arbitrary order, one after the other, satisfies the requirements of parallel forward progress for these
-      tasks. — end note]
-
-.. _libcudacxx-extended-api-execution-model-host-threads:
-
-The CUDA C++ Programming Language is an extension of the C++ Programming Language.
-This section documents the modifications and extensions to the `[intro.progress] <https://eel.is/c++draft/intro.progress>`__ section of the current `ISO International Standard ISO/IEC 14882 – Programming Language C++ <https://eel.is/c++draft/>`__ draft.
-Modified sections are called out explicitly and their diff is shown in **bold**.
-All other sections are additions.
-
-Host threads
-------------
-
-The forward progress provided by threads of execution created by the host implementation to
-execute `main <https://en.cppreference.com/w/cpp/language/main_function>`__, `std::thread <https://en.cppreference.com/w/cpp/thread/thread>`__,
-and `std::jthread https://en.cppreference.com/w/cpp/thread/jthread>`__ is implementation-defined behavior of the host
-implementation `[intro.progress] <https://eel.is/c++draft/intro.progress>`__.
-General-purpose host implementations should provide _concurrent forward progress_.
-
-If the host implementation provides `_concurrent forward progress_ [intro.progress.7] <https://eel.is/c++draft/intro.progress#7>`__,
-then CUDA C++ provides `_parallel forward progress_ [intro.progress.9] <https://eel.is/c++draft/intro.progress#9>`__ for device threads.
-
-
-.. _libcudacxx-extended-api-execution-model-device-threads:
-
-Device threads
---------------
-
-Once a device thread makes progress:
-
-- If it is part of a `Cooperative Grid <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EXECUTION.html#group__CUDART__EXECUTION_1g504b94170f83285c71031be6d5d15f73>`__,
-  all device threads in its grid shall eventually make progress.
-- Otherwise, all device threads in its `thread-block cluster <https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-block-clusters>`__
-  shall eventually make progress.
-
-  [Note: Threads in other thread-block clusters are not guaranteed to eventually make progress. - end note.]
-  [Note: This implies that all device threads within its thread block shall eventually make progress. - end note.]
-
-
-Modify `[intro.progress.1] <https://eel.is/c++draft/intro.progress>`__ as follows (modifications in **bold**):
-
-The implementation may assume that any **host** thread will eventually do one of the following:
-
-    1. terminate,
-    2. invoke the function `std::this_thread::yield <https://en.cppreference.com/w/cpp/thread/yield>`__ (`[thread.thread.this] <http://eel.is/c++draft/thread.thread.this>`__),
-    3. make a call to a library I/O function,
-    4. perform an access through a volatile glvalue,
-    5. perform a synchronization operation or an atomic operation, or
-    6. continue execution of a trivial infinite loop (`[stmt.iter.general] <http://eel.is/c++draft/stmt.iter.general>`__).
-
-**The implementation may assume that any device thread will eventually do one of the following:**
-
-    1. **terminate**,
-    2. **make a call to a library I/O function**,
-    3. **perform an access through a volatile glvalue except if the designated object has automatic storage duration, or**
-    4. **perform a synchronization operation or an atomic read operation except if the designated object has automatic storage duration.**
-
-  [Note: Some current limitations of device threads relative to host threads
-  are implementation defects known to us, that we may fix over time.
-  Examples include the undefined behavior that arises from device threads
-  that eventually only perform volatile or atomic operations
-  on automatic storage duration objects.
-  However, other limitations of device threads relative to host threads
-  are intentional choices.  They enable performance optimizations
-  that would not be possible if device threads followed the C++ Standard strictly.
-  For example, providing forward progress to programs
-  that eventually only perform atomic writes or fences
-  would degrade overall performance for little practical benefit. - end note.]
-
-.. dropdown:: Examples of forward progress guarantee differences between host and device threads due to modifications to [intro.progress.1].
-
-    The following examples refer to the itemized sub-clauses of the implementation assumptions for host and device threads above
-    using "host.threads.<id>" and "device.threads.<id>", respectively.
-
-    .. code:: cuda
-        // Example: Execution.Model.Device.0
-        // Outcome: grid eventually terminates per device.threads.4 because the atomic object does not have automatic storage duration.
-        __global__ void ex0(cuda::atomic_ref<int, cuda::thread_scope_device> atom) {
-            if (threadIdx.x == 0) {
-                while(atom.load(cuda::memory_order_relaxed) == 0);
-            } else if (threadIdx.x == 1) {
-                atom.store(1, cuda::memory_order_relaxed);
-            }
-        }
-
-    .. code:: cuda
-        // Example: Execution.Model.Device.1
-        // Allowed outcome: No thread makes progress because device threads don't support host.threads.2.
-        __global__ void ex1() {
-            while(true) cuda::std::this_thread::yield();
-        }
-
-    .. code:: cuda
-        // Example: Execution.Model.Device.2
-        // Allowed outcome: No thread makes progress because device threads don't support host.threads.4
-        // for objects with automatic storage duration (see exception in device.threads.3).
-        __global__ void ex2() {
-            volatile bool True = true;
-            while(True);
-        }
-
-    .. code:: cuda
-        // Example: Execution.Model.Device.3
-        // Allowed outcome: No thread makes progress because device threads don't support host.threads.5
-        // for objects with automatic storage duration (see exception in device.threads.4).
-        __global__ void ex3() {
-            cuda::atomic<bool, cuda::thread_scope_thread> True = true;
-            while(True.load());
-        }
-
-    .. code:: cuda
-        // Example: Execution.Model.Device.4
-        // Allowed outcome: No thread makes progress because device threads don't support host.thread.6.
-        __global void ex4() {
-            while(true) { /* empty */ }
-        }
-
-.. _libcudacxx-extended-api-execution-model-cuda-apis:
-
-CUDA APIs
----------
-
-A host or device thread CUDA API call shall eventually either return or ensure at least once device thread makes progress.
-
-CUDA query functions (e.g. `cudaStreamQuery <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html#group__CUDART__STREAM_1g2021adeb17905c7ec2a3c1bf125c5435>`__,
-`cudaEventQuery <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EVENT.html#group__CUDART__EVENT_1g2bf738909b4a059023537eaa29d8a5b7>`__, etc.) shall not consistently
-return ``cudaErrorNotReady`` without a device thread making progress.
-
-[Note: The device thread need not be "related" to the API call, e.g., an API operating on one stream or process may ensure progress of a device thread on another stream or process. - end note.]
-
-[Note: A simple but not sufficient method to test a program for CUDA API Forward Progress conformance is to run them with following environment variables set: ``CUDA_DEVICE_MAX_CONNECTIONS=1 CUDA_LAUNCH_BLOCKING=1``, and then check that the program still terminates.
-If it does not, the program has a bug.
-This method is not sufficient because it does not catch all Forward Progress bugs, but it does catch many such bugs. - end note.]
-
-.. dropdown:: Examples of CUDA API forward progress guarantees.
-
-    .. code:: cuda
-        // Example: Execution.Model.API.1
-        // Outcome: if no other device threads (e.g., from other processes) are making progress,
-        // this program terminates and returns cudaSuccess.
-        // Rationale: CUDA guarantees that if the device is empty:
-        // - `cudaDeviceSynchronize` eventually ensures that at least one device-thread makes progress, which implies that eventually `hello_world` grid and one of its device-threads start.
-        // - All thread-block threads eventually start (due to "if a device thread makes progress, all other threads in its thread-block cluster eventually make progress").
-        // - Once all threads in thread-block arrive at `__syncthreads` barrier, all waiting threads are unblocked.
-        // - Therefore all device threads eventually exit the `hello_world`` grid.
-        // - And `cudaDeviceSynchronize`` eventually unblocks.
-        __global__ void hello_world() { __syncthreads(); }
-        int main() {
-            hello_world<<<1,2>>>();
-            return (int)cudaDeviceSynchronize();
-        }
-
-    .. code:: cuda
-        // Example: Execution.Model.API.2
-        // Allowed outcome: eventually, no thread makes progress.
-        // Rationale: the `cudaDeviceSynchronize` API below is only called if a device thread eventually makes progress and sets the flag.
-        // However, CUDA only guarantees that `producer` device thread eventually starts if the synchronization API is called.
-        // Therefore, the host thread may never be unblocked from the flag spin-loop.
-        cuda::atomic<int, cuda::thread_scope_system> flag = 0;
-        __global__ void producer() { flag.store(1); }
-        int main() {
-            cudaHostRegister(&flag, sizeof(flag));
-            producer<<<1,1>>>();
-            while (flag.load() == 0);
-            return cudaDeviceSynchronize();
-        }
-
-    .. code:: cuda
-        // Example: Execution.Model.API.3
-        // Allowed outcome: eventually, no thread makes progress.
-        // Rationale: same as Example.Model.API.2, with the addition that a single CUDA query API call does not guarantee
-        // the device thread eventually starts, only repeated CUDA query API calls do (see Execution.Model.API.4).
-        cuda::atomic<int, cuda::thread_scope_system> flag = 0;
-        __global__ void producer() { flag.store(1); }
-        int main() {
-            cudaHostRegister(&flag, sizeof(flag));
-            producer<<<1,1>>>();
-            (void)cudaStreamQuery(0);
-            while (flag.load() == 0);
-            return cudaDeviceSynchronize();
-        }
-
-    .. code:: cuda
-        // Example: Execution.Model.API.4
-        // Outcome: terminates.
-        // Rationale: same as Execution.Model.API.3, but this example repeatedly calls
-        // a CUDA query API in within the flag spin-loop, which guarantees that the device thread
-        // eventually makes progress.
-        cuda::atomic<int, cuda::thread_scope_system> flag = 0;
-        __global__ void producer() { flag.store(1); }
-        int main() {
-            cudaHostRegister(&flag, sizeof(flag));
-            producer<<<1,1>>>();
-            while (flag.load() == 0) {
-                (void)cudaStreamQuery(0);
-            }
-            return cudaDeviceSynchronize();
-        }
-
-.. _libcudacxx-extended-api-execution-model-stream-ordering:
-
-Stream and event ordering
--------------------------
-
-A device thread shall not start until all its dependencies have completed.
-
-[Note: Dependencies that prevent device threads from starting to make progress can be created, for example, via CUDA Stream `Command <https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#streams>`__s.
-These may include dependencies on the completion of, among others, `CUDA Events <https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#events>`__ and `CUDA Kernels <https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernels>`__. - end note.]
-
-.. dropdown:: Examples of CUDA API forward progress guarantees due to stream and event ordering
-
-    .. code:: cuda
-        // Example: Execution.Model.Stream.0
-        // Allowed outcome: eventually, no thread makes progress.
-        // Rationale: while CUDA guarantees that one device thread makes progress, since there
-        // is no dependency between `first` and `second`, it does not guarantee which thread,
-        // and therefore it could always pick the device thread from `second`, which then never
-        // unblocks from the spin-loop.
-        // That is, `second` may starve `first`.
-        cuda::atomic<int, cuda::thread_scope_system> flag = 0;
-        __global__ void first() { flag.store(1, cuda::memory_order_relaxed); }
-        __global__ void second() { while(flag.load(cuda::memory_order_relaxed) == 0) {} }
-        int main() {
-            cudaHostRegister(&flag, sizeof(flag));
-            cudaStream_t s0, s1;
-            cudaStreamCreate(&s0);
-            cudaStreamCreate(&s1);
-            first<<<1,1,0,s0>>>();
-            second<<<1,1,0,s1>>>();
-            return cudaDeviceSynchronize();
-        }
-
-    .. code:: cuda
-        // Example: Execution.Model.Stream.1
-        // Outcome: terminates.
-        // Rationale: same as Execution.Model.Stream.0, but this example has a stream dependency
-        // between first and second, which requires CUDA to run the grids in order.
-        cuda::atomic<int, cuda::thread_scope_system> flag = 0;
-        __global__ void first() { flag.store(1, cuda::memory_order_relaxed); }
-        __global__ void second() { while(flag.load(cuda::memory_order_relaxed) == 0) {} }
-        int main() {
-            cudaHostRegister(&flag, sizeof(flag));
-            cudaStream_t s0;
-            cudaStreamCreate(&s0);
-            first<<<1,1,0,s0>>>();
-            second<<<1,1,0,s0>>>();
-            return cudaDeviceSynchronize();
-        }