Merge pull request #12 from intelligent-machine-learning/pin_2023_12_27

Pin 2023_12_27 COMMIT_ID=efa6fcfdac5368330a0770e9019649eba08b5f56
intelligent-machine-learning · Dec 28, 2023 · f5bf0b6 · f5bf0b6
2 parents 21c989d + 81ebc22
commit f5bf0b6
Show file tree

Hide file tree

Showing 111 changed files with 3,532 additions and 8,845 deletions.
diff --git a/.circleci/common.sh b/.circleci/common.sh
@@ -69,6 +69,7 @@ function install_deps_pytorch_xla() {
   pip install hypothesis
   pip install cloud-tpu-client
   pip install absl-py
+  pip install pandas
   pip install --upgrade "numpy>=1.18.5"
   pip install --upgrade numba
 
@@ -150,11 +151,6 @@ function run_torch_xla_python_tests() {
           # echo "Running MNIST Test"
           # python test/test_train_mp_mnist_amp.py --fake_data --num_epochs=1
         fi
-      elif [[ "$RUN_XLA_OP_TESTS1" == "xla_op1" ]]; then
-          # Benchmark tests.
-          # Only run on CPU, for xla_op1.
-          echo "Running Benchmark tests."
-          ./benchmarks/test/run_tests.sh
       fi
     fi
   popd

diff --git a/.circleci/docker/install_conda.sh b/.circleci/docker/install_conda.sh
@@ -42,6 +42,7 @@ function install_and_setup_conda() {
   /usr/bin/yes | pip install cloud-tpu-client
   /usr/bin/yes | pip install expecttest==0.1.3
   /usr/bin/yes | pip install absl-py
+  /usr/bin/yes | pip install pandas
   # Additional PyTorch requirements
   /usr/bin/yes | pip install scikit-image scipy==1.6.3
   /usr/bin/yes | pip install boto3==1.16.34

diff --git a/.github/workflows/_build.yml b/.github/workflows/_build.yml
@@ -38,7 +38,7 @@ on:
 jobs:
   build:
     runs-on: ${{ inputs.runner }}
-    timeout-minutes: 90
+    timeout-minutes: 180
     outputs:
       docker-image: ${{ steps.upload-docker-image.outputs.docker-image }}
     env:

diff --git a/.kokoro/common.sh b/.kokoro/common.sh
@@ -59,6 +59,7 @@ function install_deps_pytorch_xla() {
   pip install hypothesis
   pip install cloud-tpu-client
   pip install absl-py
+  pip install pandas
   pip install --upgrade "numpy>=1.18.5"
   pip install --upgrade numba
 

diff --git a/TROUBLESHOOTING.md b/TROUBLESHOOTING.md
@@ -43,7 +43,7 @@ vm:~$ git clone --branch r2.1 https://github.com/pytorch/xla.git
 vm:~$ python xla/test/test_train_mp_imagenet.py --fake_data
 ```
 
-If you can get the resnet to run we can conclude that torch_xla is installed correctly. 
+If you can get the resnet to run we can conclude that torch_xla is installed correctly.
 
 
 ## Performance Debugging
@@ -64,10 +64,10 @@ The debugging tool will analyze the metrics report and provide a summary. Some e
 
 ```
 pt-xla-profiler: CompileTime too frequent: 21 counts during 11 steps
-pt-xla-profiler: TransferFromServerTime too frequent: 11 counts during 11 steps
+pt-xla-profiler: TransferFromDeviceTime too frequent: 11 counts during 11 steps
 pt-xla-profiler: Op(s) not lowered: aten::_ctc_loss, aten::_ctc_loss_backward,  Please open a GitHub issue with the above op lowering requests.
 pt-xla-profiler: CompileTime too frequent: 23 counts during 12 steps
-pt-xla-profiler: TransferFromServerTime too frequent: 12 counts during 12 steps
+pt-xla-profiler: TransferFromDeviceTime too frequent: 12 counts during 12 steps
 ```
 
 ### Compilation & Execution Analysis

diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -90,13 +90,12 @@ python xla/benchmarks/result_analyzer.py --output-dirname=experiment_results
 
 ## Aggregating results
 
-Generated reports (e.g. `metric_report.csv` files mentioned above) can be
-aggregated to track performance improvements over time with the `aggregate.py`
-script.
+Aggregate reports can be generated directly from the output JSONL files
+(i.e., skipping `result_analyzer.py` altogether) with the `aggregate.py` script.
 The script compares Pytorch/XLA performance numbers against Inductor numbers.
 Because Inductor's performance also changes over time, the script takes
-the oldest Inductor performance numbers present in the CSV files (as determined
-by their timestamp) as the baseline for each benchmark.
+the oldest Inductor performance numbers present in the JSONL files (as
+determined by the records' timestamp) as the baseline for each benchmark.
 
 Sample runs and sample output:
 
@@ -106,7 +105,7 @@ Sample runs and sample output:
 - Note 3: we are using ASCII output here just to avoid checking in PNG files.
 
 ```
-$ python3 aggregate.py --accelerator=v100 --test=inference -i /tmp/test --format=png --report=histogram
+$ python3 aggregate.py --accelerator=v100 --test=inference --format=png --report=histogram /tmp/test/*.jsonl
 
                 Histogram of Speedup over Oldest Benchmarked Inductor
      1.2 +------------------------------------------------------------------+
@@ -126,7 +125,7 @@ $ python3 aggregate.py --accelerator=v100 --test=inference -i /tmp/test --format
         2000   2005    2010   2015    2020   2025    2030   2035    2040   2045
                                         Date
 
-$ python3 aggregate.py --accelerator=v100 --test=inference -i /tmp/test --format=png --report=speedup
+$ python3 aggregate.py --accelerator=v100 --test=inference --format=png --report=speedup /tmp/test/*.jsonl
 
         Geomean Speedup Over Oldest Benchmarked Inductor
        1 +----------------------------------------------+
@@ -145,7 +144,7 @@ $ python3 aggregate.py --accelerator=v100 --test=inference -i /tmp/test --format
      0.4 +----------------------------------------------+
         2000 2005 2010  2015 2020 2025 2030  2035 2040 2045
                               Date
-$ python3 aggregate.py --accelerator=v100 --test=inference -i /tmp/test --format=png --report=latest
+$ python3 aggregate.py --accelerator=v100 --test=inference --format=png --report=latest /tmp/test/*.jsonl
 
 Speedup Over Oldest Benchmarked Inductor as of 2023-11-11
      1.8 +----------------------------------------------+
@@ -168,7 +167,7 @@ Speedup Over Oldest Benchmarked Inductor as of 2023-11-11
 
 The last plot shows the "latest" snapshot for all benchmarks ("Workload" on the
 plot), sorting them by speedup. That is, it shows the speedup of both Inductor
-and Pytorch/XLA over the oldest Inductor data point that we have in the CSV
+and Pytorch/XLA over the oldest Inductor data point that we have in the JSONL
 files. (Note: to reiterate, because we are plotting data from single day,
 Inductor gets speedup == 1 for all benchmarks). This plot also shows the
 correctness gap between Pytorch/XLA and Inductor; there are benchmarks that do