-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.txt
executable file
·154 lines (114 loc) · 6.7 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
------------------------------------------------------------------------------------------------------------------
Radial Basis Functions with Finite Differencing for Shallow Water Equations
Last update: Richelle Streater, September 7, 2018
------------------------------------------------------------------------------------------------------------------
Getting started
Once all folders are on device, navigate to top directory: Gen3/
On HPCL:
Compile with the following command: sbatch compile_hpcl.sh
Run with the following command: sbatch run/HPCL/runCL.sh from Gen3/ directory
Expected output: output_cfdl.txt, output_sfdl.txt, output_cfdl_sfdl.txt, output_default.txt in run/HPCL/output/
On personal device:
Compile with the following command: . ./compile.sh
Run with the following command: . ./run/openCL/run.sh
Expected output: output_cfdl.txt, output_sfdl.txt, output_cfdl_sfdl.txt, output_default.txt in run/openCL/output/
------------------------------------------------------------------------------------------------------------------
Requirements
To run with OpenCL:
--> OpenCL Version 2.0 or higher
--> Add -L and -I flags to OCL_LIBS/OCL_FLAGS to config.swe if necessary
--> Can set SPLIT_DEV=0 if device does not support subdividing
--> Set OPENCL=1 in config.swe before compiling
--> Set SWE_USE_OCL=1 in run script
To run with OpenMP:
--> Set OPENMP=1 in config.swe before compiling
--> Set OMP_NUM_THREADS in run script
To run with MPI:
--> Set MPI=1 in config.swe before compiling
--> OpenCL with multiple tasks is possible, but device splitting with MPI is not implemented
--> Change LD_LIBRARY_PATH and PATH in run/hpcl/runMP.sh or run/hpcl/runCL.sh if necessary
To use NetCDF:
--> Set NCIO=1 in config.swe before compiling
--> Set SWE_USE_NETCDF=1 in run script and set SWE_INPUT_FILE to a .nc file
--> Change NETCDF variable in include.mk if necessary
To run with Intel Compiler:
--> Set MPICC=mpiicc and CC=icc in config.swe
--> Load icc module before compiling if necessary
--> Change LD_LIBRARY_PATH and PATH in run/hpcl/runMP.sh or run/hpcl/runCL.sh if necessary
------------------------------------------------------------------------------------------------------------------
Replicating test results in read_output_file/results.xlsx on NCAR HPCL
For all:
--> To compile: "sbatch compile_hpcl.sh"
--> To run: "sbatch run/hpcl/runMP.sh" or "sbatch run/hpcl/runCL.sh" from Gen3/ directory
--> To get output files: Copy output files into read_output_file/output and run read_output_file/read_script.cpp
Tabs 10242 through 655362:
--> Set OPT_FLAGS = -O3 in arch/hpcl/config.swe
--> Compile with "CC=gcc" and "MPICC=mpicc" in arch/hpcl/config.swe
--> In run/HPCL/runCL.sh, set SWE_NODES to desired number (ex. 10242)
--> In run/HPCL/runCL.sh, Layout array should be "cfdl sfdl cfdl_sfdl default" and SWE_NODES=40962
--> Run with runCL.sh
OpenMP gcc tab:
--> Set OPT_FLAGS = -O3 in arch/hpcl/config.swe
--> Compile with "CC=gcc" and "MPICC=mpicc" in arch/hpcl/config.swe
--> In run/HPCL/runMP.sh, Layout array should be "cfdl sfdl cfdl_sfdl default" and SWE_NODES=40962
--> Run
OpenMP icc tab:
--> Set OPT_FLAGS = -O3 -xHost in arch/hpcl/config.swe
--> Compile with "CC=icc" and "MPICC=mpiicc" in arch/hpcl/config.swe
--> Repeat steps 1-6 in "OpenMP gcc tab"
OpenMP KMP_aff tab:
--> Compile with "CC=icc" and "MPICC=mpiicc" in arch/hpcl/config.swe
--> In run/HPCL/runMP.sh, set Layout array to "cfdl" and SWE_NODES=40962
--> Set KMP_AFFINITY=compact and run
--> Set KMP_AFFINITY=disabled and run
--> Set KMP_AFFINITY=scatter and run
--> Set KMP_AFFINITY=balanced and run
Aliasing tab:
--> Set "CC=gcc" and "MPICC=mpicc" in arch/hpcl/config.swe
--> Compile with OPT_FLAGS=-fstrict-aliasing in arch/hpcl/config.swe
--> In run/HPCL/runMP.sh, set Layout array to "cfdl" and SWE_NODES=40962
--> Run
--> Compile with OPT_FLAGS=-fno-strict-aliasing in arch/hpcl/config.swe and run with runMP.sh
------------------------------------------------------------------------------------------------------------------
Directory structure
Top Directory Folders:
Gen3/arch: Contains configuration parameters for hpcl and pascal testing and for general
gnu or intel setup
Gen3/inputFiles: Contains all binary/netcdf input files for code
Gen3/read_output_file: Contains code to read eval_rhs values from output files
Gen3/run: Contains run scripts for HPCL and general OpenCL setup
Gen3/swe_code: Contains all c/cl code
------------------------------------------------------------------------------------------------------------------
swe_code folder structure:
Gen3/swe_code/io:
--> input.c: Reads input files, either with binary or NetCDF format, and fills all differentiation matrices,
state variable matrices, ordering, and constants
--> nc2bin.c: Converts to binary file from .nc format (not called by main function)
Gen3/swe_code/layout:
--> layout.c: Calls padding/reordering functions for differentiation matrices/state variable matrices
--> matrix_transformations.c: Functions to pad matrices (to allow for tiling/vectorizing) and rearrange based
on CFDL/SFDL options
Gen3/swe_code/main:
--> main.c: Calls reading/reordering functions and calls patch initialization functions (for MPI). Declares
OpenCL objects and compiles kernels, opens device/platform, loads buffers, and sets kernel arguments for
OpenCL. For n attempts and time steps, calls Runge-Kutta stepping function. Compares results to known array.
--> profiling.c: Use arrays of loop times to determine average time, min/max, and std dev for all operations.
Prints timing results.
--> rk4_rbffd_swe.c: Computes Runge-Kutta step with radial basis function finite differencing algorithm.
--> runtime_params.c: processes external variables set in run script.
Gen3/swe_code/mpi:
--> halos.c: Function for exchanging neighbor node information so that state variable matrix can be divided
among MPI threads
--> init_patches.c: Creates divided matrices and copies read and reordered arrays from thread 0 to other
MPI threads.
Gen3/swe_code/ocl:
--> buffers.c: converts arrays into OpenCL buffer objects to be passed into the kernels and frees buffers
--> device_setup.c: Creates all OpenCL objects: kernels, devices, platforms, and command queues
--> RK_ocl.c: Version of rk4_rbffd_swe.c that used OpenCL and calls openCL kernels
--> kernel.cl: All Runge-Kutta step functions; vectorized along nodes and works for all layouts
--> kernel_CFDL.cl: All RK step functions; vectorized along u,v,w,h in state variable matrices. Only valid for
CFDL layout.
Gen3/swe_code/rcm:
--> rcm.c: Calls all reorder functions for Reverse CutHill-McKee ordering scheme
--> reorder_nodes: Defines mapping for Reverse CutHill-McKee ordering scheme
------------------------------------------------------------------------------------------------------------------