@@ -174,6 +174,13 @@ Therefore, it's critical that you run it in a clean build directory without any
174
174
Running cmake is enough. Alternatively, you can also clean your build directory with.
175
175
Furthermore, the tuning scripts require some additional python dependencies, which you have to install.
176
176
177
+ <<<<<<< Updated upstream
178
+ =======
179
+ To select the appropriate CUDA GPU, first identify the GPU ID by running `nvidia-smi `, then set the
180
+ desired GPU using `export CUDA_VISIBLE_DEVICES=x `, where `x ` is the ID of the GPU you want to use (e.g., `1 `).
181
+ This ensures your application uses only the specified GPU.
182
+
183
+ >>>>>>> Stashed changes
177
184
.. code-block :: bash
178
185
179
186
ninja clean
@@ -183,7 +190,7 @@ We can then run the full benchmark suite from the build directory with:
183
190
184
191
.. code-block :: bash
185
192
186
- .. /benchmarks/scripts/run.py
193
+ < root_dir_to_cccl > /cccl /benchmarks/scripts/run.py
187
194
188
195
You can expect the output to look like this:
189
196
@@ -205,7 +212,7 @@ It's also possible to benchmark a subset of algorithms and workloads:
205
212
206
213
.. code-block :: bash
207
214
208
- .. /benchmarks/scripts/run.py -R ' .*scan.exclusive.sum.*' -a ' Elements{io}[pow2]=[24,28]' -a ' T{ct}=I32'
215
+ < root_dir_to_cccl > /cccl /benchmarks/scripts/run.py -R ' .*scan.exclusive.sum.*' -a ' Elements{io}[pow2]=[24,28]' -a ' T{ct}=I32'
209
216
&&&& RUNNING bench
210
217
ctk: 12.6.77
211
218
cccl: v2.7.0-rc0-265-g32aa6aa5a
@@ -229,7 +236,7 @@ The resulting database contains all samples, which can be extracted into JSON fi
229
236
230
237
.. code-block :: bash
231
238
232
- .. /benchmarks/scripts/analyze.py -o ./cccl_meta_bench.db
239
+ < root_dir_to_cccl > /cccl /benchmarks/scripts/analyze.py -o ./cccl_meta_bench.db
233
240
234
241
This will create a JSON file for each benchmark variant next to the database.
235
242
For example:
0 commit comments