You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+8-8
Original file line number
Diff line number
Diff line change
@@ -10,8 +10,8 @@ be installed automatically when building GT Bench with cmake unless specified ot
10
10
11
11
Further external dependencies are listed below:
12
12
Required:
13
-
-[CMake](https://cmake.org/) (minimum version 3.14.5)
14
-
-[Boost](https://www.boost.org/) (minimun version 1.73.0)
13
+
-[CMake](https://cmake.org/) (minimum version 3.18.1)
14
+
-[Boost](https://www.boost.org/) (minimum version 1.73.0)
15
15
- MPI (for example [OpenMPI](https://github.com/open-mpi/ompi))
16
16
17
17
Optional:
@@ -35,7 +35,7 @@ The backend can be selected by setting the `GTBENCH_BACKEND` option when configu
35
35
```console
36
36
$ cmake -DGTBENCH_BACKEND=<BACKEND> ..
37
37
```
38
-
where `<BACKEND>` must be either `cpu_kfirst`, `cpu_ifirst`, or `gpu`. The `cpu_kfirst` and `cpu_ifirst` backends are two different CPU-backends of GridTools. On modern CPUs with large vector width and/or many cores, the `cpu_ifirst` backend might perform significantly better. On CPUs without vectorization or small vector width and limited parallelism, the `cpu_kfirst` backend might perform better. The `hip` backend currently supports running NVIDIA CUDA-capable GPUs and AMD HIP-capable GPUs.
38
+
where `<BACKEND>` must be either `cpu_kfirst`, `cpu_ifirst`, or `gpu`. The `cpu_kfirst` and `cpu_ifirst` backends are two different CPU-backends of GridTools. On modern CPUs with large vector width and/or many cores, the `cpu_ifirst` backend might perform significantly better. On CPUs without vectorization or small vector width and limited parallelism, the `cpu_kfirst` backend might perform better. The `gpu` backend currently supports running NVIDIA CUDA-capable GPUs and AMD HIP-capable GPUs.
39
39
40
40
### Selecting the GPU Compilation Framework
41
41
@@ -56,7 +56,7 @@ where `RUNTIME` can be `ghex_comm`, `gcl`, `simple_mpi`, `single_node`.
56
56
- The `simple_mpi` implementation uses a simple MPI 2 sided communication for halo exchanges.
57
57
- The `gcl` implementation uses a optimized MPI based communication library shipped with [GridTools](https://gridtools.github.io/gridtools/latest/user_manual/user_manual.html#halo-exchanges).
58
58
- The `ghex_comm` option will use highly optimized distributed communication via the GHEX library, designed for best performance at scale.
59
-
Additionally, this option will enable a multi-threaded version of the benchmark, where a rank may have more than one sub-domain (over-subscription), which are delegated to separate threads. **Note:** The gridtools computations use openmp threads on the CPU targets which will not be affected by this parameter.
59
+
Additionally, this option will enable a multi-threaded version of the benchmark, where a rank may have more than one sub-domain (over-subscription), which are delegated to separate threads. **Note:** The gridtools computations use OpenMP threads on the CPU targets which will not be affected by this parameter.
60
60
61
61
#### Selecting the Transport Layer for GHEX
62
62
@@ -88,9 +88,9 @@ To enable xpmem support, pass additionally the following flags
88
88
89
89
### Benchmark
90
90
91
-
The benchmark executable requires the global horizontal domain size as a command line parameter. The simulation will then be performed on a total domain size of `NX×NY×60` grid points. To launch the benchmark use the appropriate MPI launcher (`mpirun`, `mpiexec`, `srun`, or similar):
91
+
The benchmark executable requires the domain size as a command line parameter. The simulation will then be performed on a total domain size of `NX×NY×NZ` grid points. To launch the benchmark use the appropriate MPI launcher (`mpirun`, `mpiexec`, `srun`, or similar):
0 commit comments