Rework the benchmark script to be more generic (OpenNMT#485)

* Rework the benchmark script to be more generic * Cleanup paths to Dockerfile * Update README and command line options
guillaumekln · Jun 9, 2021 · d37076c · d37076c
1 parent 7a95290
commit d37076c
Show file tree

Hide file tree

Showing 21 changed files with 441 additions and 231 deletions.
diff --git a/tools/benchmark/README.md b/tools/benchmark/README.md
@@ -1,28 +1,47 @@
 ## Benchmark tools
 
-This directory contains some scripts to benchmark CTranslate2 and compare against OpenNMT-py and OpenNMT-tf.
+This directory contains some scripts to benchmark translation systems.
 
 ### Requirements
 
 * Python 3
 * Docker
 
+```bash
+python3 -m pip install -r requirements.txt
+```
+
 ### Usage
 
-```bash
-pip install -r requirements.txt
+```text
+python3 benchmark.py <IMAGE> <SOURCE> <REFERENCE>
+```
+
+The Docker image must contain 3 scripts at its root:
+
+* `/tokenize.sh $input $output`
+* `/detokenize.sh $input $output`
+* `/translate.sh $device $input $output`, where:
+  * `$device` is "CPU" or "GPU"
+  * `$input` is the path to the tokenized input file
+  * `$output` is the path where the tokenized output should be written
 
+The benchmark script will report multiple metrics. The results can be aggregated over multiple runs using the option `--num_samples N`. See `python3 benchmark.py -h` for additional options.
+
+Note: the script focuses on raw decoding performance so the following steps are **not** included in the translation time:
+
+* tokenization
+* detokenization
+* model initialization (obtained by translating an empty file)
+
+### Reproducing the benchmark numbers from the README
+
+We use the script `benchmark_pretrained.py` to produce the benchmark numbers in the main [README](https://github.com/OpenNMT/CTranslate2#benchmarks). The directory `pretrained_transformer_base` contains the Docker images corresponding to the pretrained OpenNMT Transformers.
+
+```text
 # Run CPU benchmark:
-./run.sh
+python3 benchmark_pretrained.py cpu
 
 # Run GPU benchmark:
-./run.sh 1
+python3 benchmark_pretrained.py gpu
 ```
-
-The script outputs one result per line where each line consists of 5 fields separated by a semicolon:
-
-1. Run name
-1. BLEU score
-1. Tokens per second
-1. System maximum memory usage (MB)
-1. GPU maximum memory usage (MB)