From 4182329ebadf09b7b152e9434fc4778f7adccad3 Mon Sep 17 00:00:00 2001 From: Giannis Gonidelis Date: Fri, 22 Mar 2024 15:42:14 -0700 Subject: [PATCH] Minor fixes and additions on cub developer guides (#1559) * Fix phrasing in cub developer overview * Explain how test cases can bloat when multiple random generator seeds are being used in combination with multiple configuration parameters * Less verbal explanation. Co-authored-by: Michael Schellenberger Costa --------- Co-authored-by: Michael Schellenberger Costa --- cub/docs/developer_overview.rst | 2 +- cub/docs/test_overview.rst | 34 +++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+), 1 deletion(-) diff --git a/cub/docs/developer_overview.rst b/cub/docs/developer_overview.rst index f05e2db22d6..8f1d5ea913d 100644 --- a/cub/docs/developer_overview.rst +++ b/cub/docs/developer_overview.rst @@ -543,7 +543,7 @@ The dispatch entry point is typically represented by a static member function th }; For many algorithms, the dispatch layer is part of the API. -The first reason for this to be the case is ``size_t`` support. +The main reason for this integration is to support ``size_t``. Our API uses ``int`` as a type for ``num_items``. Users rely on the dispatch layer directly to workaround this. Exposing the dispatch layer also allows users to tune algorithms for their use cases. diff --git a/cub/docs/test_overview.rst b/cub/docs/test_overview.rst index 427cbf6bd95..2ddfe146d4f 100644 --- a/cub/docs/test_overview.rst +++ b/cub/docs/test_overview.rst @@ -200,6 +200,40 @@ The code above leads to the following combinations being compiled: - ``type = std::int32_t``, ``threads_per_block = 128`` - ``type = std::int32_t``, ``threads_per_block = 256`` +As an example, the following test case includes both multidimensional configuration spaces +and multiple random sequence generations. + +.. code-block:: c++ + + using block_sizes = c2h::enum_type_list; + using types = c2h::type_list; + + CUB_TEST("SCOPE FACILITY works with CONDITION", + "[FACILITY][SCOPE]", + types, + block_sizes) + { + using type = typename c2h::get<0, TestType>; + constexpr int threads_per_block = c2h::get<1, TestType>::value; + // ... + c2h::device_vector d_input(5); + c2h::gen(CUB_SEED(2), d_input); + } + +The code above leads to the following combinations being compiled: + +- ``type = std::uint8_t``, ``threads_per_block = 128``, 1st random generated input sequence +- ``type = std::uint8_t``, ``threads_per_block = 256``, 1st random generated input sequence +- ``type = std::int32_t``, ``threads_per_block = 128``, 1st random generated input sequence +- ``type = std::int32_t``, ``threads_per_block = 256``, 1st random generated input sequence +- ``type = std::uint8_t``, ``threads_per_block = 128``, 2nd random generated input sequence +- ``type = std::uint8_t``, ``threads_per_block = 256``, 2nd random generated input sequence +- ``type = std::int32_t``, ``threads_per_block = 128``, 2nd random generated input sequence +- ``type = std::int32_t``, ``threads_per_block = 256``, 2nd random generated input sequence + +Each new generator multiplies the number of execution times by its number of seeds. That means +that if there were further more sequence generators (``c2h::gen(CUB_SEED(X), ...)``) on the +example above the test would execute X more times and so on. Speedup Compilation Time =====================================