forked from pytorch/xla
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update OpenXLA-pin to Nov24 #8
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
wbmc
commented
Dec 5, 2023
- update OpenXLA-pin update to Nov24, with GPU test speed up after Optimize autocast tests. pytorch/xla#5970
- pr
* Collect CUDA/CPU profiling info into result sheets. This PR: 0. Adds CUDA/CPU collection capabilties to the script. 1. Modifies result_analyzer.py to analyze newly collected results. 2. Moves CUDA synchronize/XLA device synchronize into the profiler. 3. Fixes list typing for Python 3.8+. Tested with command: python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output * Lint, and add _s suffix to metrics --------- Co-authored-by: root <[email protected]>
…pytorch#5914) * Add test. * Create `base_` tensor for views. * Use base tensor in `as_strided` operation. * Set base tensor of `as_strided`. * Fix lint errors. * Fix for disabled functionalization. * Address review.
(de)quantize_per_tensor/channel ops from PT2E quantization workflow are lowered to stablehlo uniform_dequantize/quantize. --------- Co-authored-by: Siyuan Liu <[email protected]>
* Don't fallback for pow
…utation (pytorch#5933) * Truncate python stack when outputting frame that cause the graph execution * add mp tests * move tests to a new dir --------- Co-authored-by: root <[email protected]>
Update some missing changes from `GPU` to `CUDA`
…rch#5944) Add link to pytorch#5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.
* Add benchmark noise reducing info. Add info about knobs making benchmarks more stable across different runs. * Add more general info about setting clock freq. * Move comments out of the code
…5948) * Error when changing `PJRT_DEVICE` after runtime initialized * format * better error
…ch#5737) * Refactor ExecuteReplicated to operate on sharded data directly * Remove old handlers * formatting * Improve naming and logging * update docstring * Remove obsolete unit tests * improve comment * Remove slow calls to get output shapes. * fix implicit sharding * remove declarations of input/output handlers * formatting * give everything a manual placeholder sharding * see if CI passes * formatting * Shard parameter and output handling * Use absl::BlockingCounter * formatting * fix merge * Assign valid output shardings * tune and document costs * formatting * implicitly replicate output to match outputhandler * clarify ReplicateShardedData * fix merge
* Add graph hash and num input/output to PT_XLA_DEBUG * Remove unnecessary checks * fix typo * static const
…custom op_name (pytorch#5838) * Add python binding to allow custom op_name metadata for lowere HLO * As discussed increase timeout on GPU tests by 20% * Add lowering for stack frame index and stack frame id in metadata * Add fix for stack depth when using set custom op_name in a python context * Changes after adding tests for lowered stack frames and finding several issues * Add routine to XlaNode to search back through operands and recusively set meta data * Fix recursion condition so we don't explore nodes with metadata
* Distribute Literal->Tensor copies across thread pool * Update for pytorch#5799
This PR enables fast TF32 for PyTorch by default to mirror XLA behaviour.
* Add all-gather and reduce-scatter coalescence support for FSDP. Also allow using reduce-scatter's scale param in FSDP. (revived pytorch#4145) * clang-format-7 and python lint fixes * Fix "SyntaxError: 'return' outside function" error * Code/test fixes to get run_tests.sh to run on CPU * Fix allgather to be compatible with openxla allgather tuple change without token * Fix reduce-scatter-coalesce to be compatible with openxla reduce-scatter tuple change without token * Separate out the reduce-scatter-coalesce changes into a separate PR * Some cleanups * Add separate BuildAllGatherCoalesced builder and AllGatherCoalesced class * Use token_handler.GetInput to capture token * Clean up * Clean up * Switch to GetOperandListWithToken naming for func GetOperandList
…#5939) * Allow openxla for eval. * Update readme. * Revert `openxla_eval` rule.
* Only initialize once for the test suite instead of each test. * remove comments * removed unused lines * fix linter * fix a tpu issue * fix minor issue
* Add script for updating core aten opset issue * Add update function
Funnily this was breaking my script at pytorch#5821
* Add graph hash to save tensor output * Add support for dynamo * fix test
* Add profiler API for async capture * Add unit test
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.