Update OpenXLA-pin to Nov24 #8

wbmc · 2023-12-05T11:16:27Z

update OpenXLA-pin update to Nov24, with GPU test speed up after Optimize autocast tests. pytorch/xla#5970
pr

* Collect CUDA/CPU profiling info into result sheets. This PR: 0. Adds CUDA/CPU collection capabilties to the script. 1. Modifies result_analyzer.py to analyze newly collected results. 2. Moves CUDA synchronize/XLA device synchronize into the profiler. 3. Fixes list typing for Python 3.8+. Tested with command: python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output * Lint, and add _s suffix to metrics --------- Co-authored-by: root <[email protected]>

…pytorch#5914) * Add test. * Create `base_` tensor for views. * Use base tensor in `as_strided` operation. * Set base tensor of `as_strided`. * Fix lint errors. * Fix for disabled functionalization. * Address review.

(de)quantize_per_tensor/channel ops from PT2E quantization workflow are lowered to stablehlo uniform_dequantize/quantize. --------- Co-authored-by: Siyuan Liu <[email protected]>

* Don't fallback for pow

…utation (pytorch#5933) * Truncate python stack when outputting frame that cause the graph execution * add mp tests * move tests to a new dir --------- Co-authored-by: root <[email protected]>

Update some missing changes from `GPU` to `CUDA`

…rch#5944) Add link to pytorch#5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

* Add benchmark noise reducing info. Add info about knobs making benchmarks more stable across different runs. * Add more general info about setting clock freq. * Move comments out of the code

…5948) * Error when changing `PJRT_DEVICE` after runtime initialized * format * better error

…ch#5737) * Refactor ExecuteReplicated to operate on sharded data directly * Remove old handlers * formatting * Improve naming and logging * update docstring * Remove obsolete unit tests * improve comment * Remove slow calls to get output shapes. * fix implicit sharding * remove declarations of input/output handlers * formatting * give everything a manual placeholder sharding * see if CI passes * formatting * Shard parameter and output handling * Use absl::BlockingCounter * formatting * fix merge * Assign valid output shardings * tune and document costs * formatting * implicitly replicate output to match outputhandler * clarify ReplicateShardedData * fix merge

* Add graph hash and num input/output to PT_XLA_DEBUG * Remove unnecessary checks * fix typo * static const

…ch#5929)

…custom op_name (pytorch#5838) * Add python binding to allow custom op_name metadata for lowere HLO * As discussed increase timeout on GPU tests by 20% * Add lowering for stack frame index and stack frame id in metadata * Add fix for stack depth when using set custom op_name in a python context * Changes after adding tests for lowered stack frames and finding several issues * Add routine to XlaNode to search back through operands and recusively set meta data * Fix recursion condition so we don't explore nodes with metadata

* Distribute Literal->Tensor copies across thread pool * Update for pytorch#5799

This PR enables fast TF32 for PyTorch by default to mirror XLA behaviour.

* Add all-gather and reduce-scatter coalescence support for FSDP. Also allow using reduce-scatter's scale param in FSDP. (revived pytorch#4145) * clang-format-7 and python lint fixes * Fix "SyntaxError: 'return' outside function" error * Code/test fixes to get run_tests.sh to run on CPU * Fix allgather to be compatible with openxla allgather tuple change without token * Fix reduce-scatter-coalesce to be compatible with openxla reduce-scatter tuple change without token * Separate out the reduce-scatter-coalesce changes into a separate PR * Some cleanups * Add separate BuildAllGatherCoalesced builder and AllGatherCoalesced class * Use token_handler.GetInput to capture token * Clean up * Clean up * Switch to GetOperandListWithToken naming for func GetOperandList

…#5939) * Allow openxla for eval. * Update readme. * Revert `openxla_eval` rule.

* Only initialize once for the test suite instead of each test. * remove comments * removed unused lines * fix linter * fix a tpu issue * fix minor issue

* Add script for updating core aten opset issue * Add update function

Funnily this was breaking my script at pytorch#5821

* Add graph hash to save tensor output * Add support for dynamo * fix test

* Add profiler API for async capture * Add unit test

golechwierowicz and others added 30 commits November 28, 2023 09:05

Lower quant/dequant torch op to StableHLO (pytorch#5763)

a3b0c6e

(de)quantize_per_tensor/channel ops from PT2E quantization workflow are lowered to stablehlo uniform_dequantize/quantize. --------- Co-authored-by: Siyuan Liu <[email protected]>

Don't fallback for pow (pytorch#5919)

402166b

* Don't fallback for pow

Truncate python stack when outputting frame that cause the graph exec…

a5e0738

…utation (pytorch#5933) * Truncate python stack when outputting frame that cause the graph execution * add mp tests * move tests to a new dir --------- Co-authored-by: root <[email protected]>

Update pjrt.md (pytorch#5941)

eb728a8

Update some missing changes from `GPU` to `CUDA`

Parallelize d2h transfers (pytorch#5824)

b9475d9

Add link to work log issue in FIX_LOWERING_FOR_CORE_ATEN_OPS.md (pyto…

09f2a14

…rch#5944) Add link to pytorch#5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

Add benchmark noise reducing info. (pytorch#5924)

c8ce062

* Add benchmark noise reducing info. Add info about knobs making benchmarks more stable across different runs. * Add more general info about setting clock freq. * Move comments out of the code

Error when changing PJRT_DEVICE after runtime initialized (pytorch#…

346ef7c

…5948) * Error when changing `PJRT_DEVICE` after runtime initialized * format * better error

Add graph hash and num input/output to PT_XLA_DEBUG (pytorch#5947)

f3b75ba

* Add graph hash and num input/output to PT_XLA_DEBUG * Remove unnecessary checks * fix typo * static const

Update troubleshooting doc with few common env var combination (pytor…

6c5b7b8

…ch#5929)

Distribute Literal->Tensor copies across thread pool (pytorch#5825)

ec54fd4

* Distribute Literal->Tensor copies across thread pool * Update for pytorch#5799

use a common random seed for test core aten ops (pytorch#5949)

f6a775c

Vectorize local shard retrieval (pytorch#5826)

c919973

Enable TF32 multiplication by default. (pytorch#5965)

e98cb66

This PR enables fast TF32 for PyTorch by default to mirror XLA behaviour.

Increase tolerance for tan (pytorch#5915)

2c4983d

Allow benchmark runner config: openxla for inference runs. (pytorch…

dcdd66e

…#5939) * Allow openxla for eval. * Update readme. * Revert `openxla_eval` rule.

Optimize autocast tests. (pytorch#5970)

67cb9ca

* Only initialize once for the test suite instead of each test. * remove comments * removed unused lines * fix linter * fix a tpu issue * fix minor issue

Add script for updating core aten opset issue (pytorch#5821)

74fd6e5

* Add script for updating core aten opset issue * Add update function

Fix duplicate @unittest.skip test_core_aten_ops.py (pytorch#5976)

990c673

Funnily this was breaking my script at pytorch#5821

[Export] Add an option to not save the weights (pytorch#5964)

77b968a

[benchmarks] trivial: fix typo (pytorch#6016)

35c5333

Make the fsdp test run by torchrun (pytorch#5916)

f80a414

Add graph hash to save tensor output (pytorch#5971)

495c820

* Add graph hash to save tensor output * Add support for dynamo * fix test

Add profiler API for async capture (pytorch#5969)

6d73ca8

* Add profiler API for async capture * Add unit test

fix docker issue fore xla GPU run on H100 (pytorch#6018)

75d94fe

JackCaoG and others added 4 commits December 4, 2023 15:49

Fix test_dynamo_input_sharding_threashold (pytorch#5957)

513be64

[SPMD] Add debug of SPMD for single/multi host (pytorch#5742)

f94b53c

Update OpenXLA-pin to Nov24 (pytorch#6012)

729bae5

Merge commit '729bae5ebb25eadff03c374a895c5234de203673' into HEAD

cca6b18

wbmc merged commit a610b9b into master Dec 5, 2023
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update OpenXLA-pin to Nov24 #8

Update OpenXLA-pin to Nov24 #8

wbmc commented Dec 5, 2023

Update OpenXLA-pin to Nov24 #8

Update OpenXLA-pin to Nov24 #8

Conversation

wbmc commented Dec 5, 2023