Skip to content

Commit 052db85

Browse files
authored
Fix rst typos in benchmarking.html (#2868)
* Fix rst typos in benchmarking.html * typo * run.py script proper directory With our devcontainers and presets dev environment it's highly likely that `cccl/` is not just a step above. * Add `CUDA_VISIBLE_DEVICES` warning explanation so it can be avoided
1 parent 55ca56e commit 052db85

File tree

1 file changed

+11
-5
lines changed

1 file changed

+11
-5
lines changed

docs/cub/benchmarking.rst

+11-5
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,11 @@ You clone the repository, create a build directory and configure the build with
2727
It's important that you enable benchmarks (`CCCL_ENABLE_BENCHMARKS=ON`),
2828
build in Release mode (`CMAKE_BUILD_TYPE=Release`),
2929
and set the GPU architecture to match your system (`CMAKE_CUDA_ARCHITECTURES=XX`).
30-
This <website `https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/`>_
30+
This `website <https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/>`_
3131
contains a great table listing the architectures for different brands of GPUs.
32+
3233
.. TODO(bgruber): do we have a public NVIDIA maintained table I can link here instead?
34+
3335
We use Ninja as CMake generator in this guide, but you can use any other generator you prefer.
3436

3537
You can then proceed to build the benchmarks.
@@ -172,6 +174,10 @@ Therefore, it's critical that you run it in a clean build directory without any
172174
Running cmake is enough. Alternatively, you can also clean your build directory with.
173175
Furthermore, the tuning scripts require some additional python dependencies, which you have to install.
174176

177+
To select the appropriate CUDA GPU, first identify the GPU ID by running `nvidia-smi`, then set the
178+
desired GPU using `export CUDA_VISIBLE_DEVICES=x`, where `x` is the ID of the GPU you want to use (e.g., `1`).
179+
This ensures your application uses only the specified GPU.
180+
175181
.. code-block:: bash
176182
177183
ninja clean
@@ -181,7 +187,7 @@ We can then run the full benchmark suite from the build directory with:
181187

182188
.. code-block:: bash
183189
184-
../benchmarks/scripts/run.py
190+
<root_dir_to_cccl>/cccl/benchmarks/scripts/run.py
185191
186192
You can expect the output to look like this:
187193

@@ -197,13 +203,13 @@ You can expect the output to look like this:
197203
...
198204
199205
The tuning infrastructure will build and execute all benchmarks and their variants one after each other,
200-
reporting the time it seconds it took to execute the benchmark executable.
206+
reporting the time in seconds it took to execute the benchmark executable.
201207

202208
It's also possible to benchmark a subset of algorithms and workloads:
203209

204210
.. code-block:: bash
205211
206-
../benchmarks/scripts/run.py -R '.*scan.exclusive.sum.*' -a 'Elements{io}[pow2]=[24,28]' -a 'T{ct}=I32'
212+
<root_dir_to_cccl>/cccl/benchmarks/scripts/run.py -R '.*scan.exclusive.sum.*' -a 'Elements{io}[pow2]=[24,28]' -a 'T{ct}=I32'
207213
&&&& RUNNING bench
208214
ctk: 12.6.77
209215
cccl: v2.7.0-rc0-265-g32aa6aa5a
@@ -227,7 +233,7 @@ The resulting database contains all samples, which can be extracted into JSON fi
227233

228234
.. code-block:: bash
229235
230-
../benchmarks/scripts/analyze.py -o ./cccl_meta_bench.db
236+
<root_dir_to_cccl>/cccl/benchmarks/scripts/analyze.py -o ./cccl_meta_bench.db
231237
232238
This will create a JSON file for each benchmark variant next to the database.
233239
For example:

0 commit comments

Comments
 (0)