Skip to content

Conversation

namgyu-youn
Copy link
Contributor

@namgyu-youn namgyu-youn commented May 15, 2025

Introduces single-GPU benchmarks for comparing various preconditioners: SGD, AdaGrad, Root Inverse Shampoo, Eigendecomposed Shampoo, and Eigenvalue-Corrected Shampoo

In rich Console, developers can check the following:

  • Total/Average time taken for each preconditioner
  • CPU/GPU usage

In PyTorch profiler, developers can check the following:

  • Most time-consuming operations (5-th)
  • Bottleneck analysis for each preconditioner

Co-authored-by: Tsung-Hsien [email protected]

The benchmark compares the performance of various preconditioners (SGD, AdaGrad, Root Inverse Shampoo, Eigendecomposed Shampoo, and Eigenvalue-Corrected Shampoo) using rich console and PyTorch profiler.

In rich console, you can check the following:
- Total time taken for each preconditioner
- Average time taken per epoch
- Memory usage in MB
- GPU utilization percentage (if applicable)

In PyTorch profiler, you can check the following:
- Most time-consuming operations (5-th)
- Bottleneck analysis for each preconditioner

Requested by @tsunghsienlee in facebookresearch#157 for developers experience.
Co-authored-by: Tsung-Hsien Lee <[email protected]>
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2025
@namgyu-youn
Copy link
Contributor Author

namgyu-youn commented May 15, 2025

Sorry for the multiple-PRs; I have to learn more about VCS... Also, I will gratefully wait the review until July based on #163 - Comment. But I truly believe this PR would be useful. Example is attached in #171 - README.md

@namgyu-youn namgyu-youn marked this pull request as draft June 4, 2025 06:46
@namgyu-youn
Copy link
Contributor Author

namgyu-youn commented Jun 4, 2025

Benchmarks in NVIDIA RTX AX2000:

image

1. Hardcode the device to "cuda" and basic configurations for benchmarks.
2. Enhance sorting logics for profiling results.
3. Fix typo in rich Console output.
- top_ops is not a valid name for a variable, it should be profiling_table
@namgyu-youn namgyu-youn marked this pull request as ready for review June 4, 2025 12:01
@namgyu-youn
Copy link
Contributor Author

cc @runame @tsunghsienlee

@tsunghsienlee
Copy link
Contributor

cc @runame @tsunghsienlee

Hi @namgyu-youn , sorry for my late reply, and I was too busy for the work so I might not be able to review this. Sorry that I bring this idea to you before.

@namgyu-youn
Copy link
Contributor Author

namgyu-youn commented Jul 7, 2025

Hi @namgyu-youn , sorry for my late reply, and I was too busy for the work so I might not be able to review this. Sorry that I bring this idea to you before.

Never mind. Since learning torch.profiler was a valuable experience for me, I want to appreciate your suggestion.

But lastly, I want to ask if this PR could be triaged because this update must be helpful for your teams. I will wait @runame, but it seems the review might be delayed (or neglected). The result log message is here, and I hope this update could be helpful for this project; Please consider the review.

@namgyu-youn namgyu-youn changed the title Build single-GPU benchmark for preconditioners Single-GPU benchmark for preconditioners Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants