New `OptimizerInBackward` #2719

joecummings · 2025-05-12T01:24:53Z

`OptimizerInBackward`

Motivation

Previously, we utilized a process very similar to the one outlined in this blog post. Although it worked, it involved lots of if/else switching in the recipe based on what composed well with optimizer fused in backwards and what did not.

Goal

Simplify our recipes (729 LOC -> 677 LOC)! This PR provides a canonical OptimizerInBackward class that can be used as a drop-in replacement for any PyTorch optimizer. It integrates into the full_finetune_single_device.py recipe and adds tests + updates documentation.

Testing

To-do

Integrate into full_finetune_distributed.py
Integrate into rest of recipes
Deprecate the utility functions used to create the optim in backward hooks previously

pytorch-bot · 2025-05-12T01:24:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2719

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ba8a975 with merge base e5ee1b2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

codecov-commenter · 2025-05-15T00:43:58Z

Codecov Report

Attention: Patch coverage is 74.10072% with 36 lines in your changes missing coverage. Please review.

Project coverage is 60.12%. Comparing base (c8e670b) to head (5674f9a).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
recipes/full_finetune_single_device.py	0.00%	35 Missing ⚠️
torchtune/modules/optim.py	97.05%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2719      +/-   ##
==========================================
- Coverage   60.64%   60.12%   -0.52%     
==========================================
  Files         428      432       +4     
  Lines       26091    26520     +429     
==========================================
+ Hits        15823    15946     +123     
- Misses      10268    10574     +306

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

SalmanMohammadi · 2025-05-15T23:58:10Z

docs/source/tutorials/memory_optimizations.rst

-with a model with a lot of parameters, and when you don't need to use :ref:`gradient accumulation <glossary_grad_accm>`.
+.. code-block:: python
+
+  OptimizerInBackward(


I know this was already discussed but looking at it here it's confusing that we have a class that's meant to be a drop in optimizer, but if you actually want to use it in the recipe we aren't using torchtune's regular instantiation-based configuation for this component, but rely on logic inside the recipe itself to construct the optimizer.

I don't have a neat solution that doesn't involve breaking all the configs unfortunately. Maybe we don't advertise this as a separate modular component but as a core recipe feature? Power users can bear the brunt of implementing it in their own recipes

Fair， I'll work on the wording here.

…n using a config

SalmanMohammadi · 2025-05-19T09:30:59Z

torchtune/training/checkpointing/_checkpoint_client.py

-                    )
-                else:
+            if is_not_distributed_checkpointer:
+                # This check can be removed once we fully migrate over to ``OptimizerInBackward``


Worth raising an issue to follow up on this?

Yeah fair, I might just opt to raise a meta-Issue tracking how we integrate this into the rest of our recipes and any cleanup that needs to happen to deprecate the old material.

SalmanMohammadi

Really clean work

krammnic · 2025-05-19T18:08:10Z

Amazing work! (I've attempted this, but my RFC was rejected, so I'm really glad to see this merged.)

joecummings added 3 commits May 11, 2025 07:37

Throw error if trying to compile the opt step with optim in bwd

a815528

[WIP] FusedOptimizerInBackward

5327ad5

Recipe changes

7cb466a

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2025

joecummings added 8 commits May 11, 2025 21:27

Merge branch 'main' into re-design-optim-in-bwd

74f7c0c

Move to separate file

0c86e19

Change error types

e185905

Add tests

9d6567d

Pass the tests

fd0063d

Comment that what I'm doing is BAD

ce0f421

asdf

7951ab3

Merge

65614ac

joecummings changed the title ~~[WIP] FusedOptimizerInBackward (redesign)~~ New OptimizerInBackward May 14, 2025

joecummings added 6 commits May 14, 2025 06:55

Add docs

59560f6

Merge remote-tracking branch 'upstream/main' into re-design-optim-in-bwd

4f41350

Cleanup

3a5ee7e

Remove test that no longer is important

891bad9

Little bit o cleanup

cde9c72

Fix calculation

60a75f6

joecummings marked this pull request as ready for review May 14, 2025 22:37

joecummings requested review from ebsmothers, felipemello1 and pbontrager May 14, 2025 22:37

Merge remote-tracking branch 'upstream/main' into re-design-optim-in-bwd

5674f9a

joecummings added 4 commits May 15, 2025 10:58

Corner case within checkpointing

570e1f6

Whoops

0ac3404

Prevent warning w/ LR scheduler

77cff2f

Fix last test and clean up

c864cf4

SalmanMohammadi reviewed May 15, 2025

View reviewed changes

joecummings added 3 commits May 16, 2025 06:46

Modify docs to make it clear that this is not drop-in replacement whe…

899e61a

…n using a config

Merge w/ main

e482894

Merge remote-tracking branch 'upstream/main' into re-design-optim-in-bwd

ba8a975

SalmanMohammadi reviewed May 19, 2025

View reviewed changes

SalmanMohammadi approved these changes May 19, 2025

View reviewed changes

joecummings added the triage review This issue should be discussed in weekly review label May 19, 2025

joecummings removed the triage review This issue should be discussed in weekly review label May 19, 2025

joecummings merged commit 83c8f97 into pytorch:main May 19, 2025
14 checks passed

joecummings mentioned this pull request May 19, 2025

Integrate new OptimizerInBackward in rest of torchtune codebase #2750

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New `OptimizerInBackward` #2719

New `OptimizerInBackward` #2719

Uh oh!

joecummings commented May 12, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 12, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented May 15, 2025

Uh oh!

SalmanMohammadi May 15, 2025

Uh oh!

joecummings May 16, 2025

Uh oh!

SalmanMohammadi May 19, 2025

Uh oh!

joecummings May 19, 2025

Uh oh!

SalmanMohammadi left a comment

Uh oh!

krammnic commented May 19, 2025

Uh oh!

Uh oh!

Uh oh!

New OptimizerInBackward #2719

New OptimizerInBackward #2719

Uh oh!

Conversation

joecummings commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OptimizerInBackward

Motivation

Goal

Testing

To-do

Uh oh!

pytorch-bot bot commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2719

✅ No Failures

Uh oh!

codecov-commenter commented May 15, 2025

Codecov Report

Uh oh!

SalmanMohammadi May 15, 2025

Choose a reason for hiding this comment

Uh oh!

joecummings May 16, 2025

Choose a reason for hiding this comment

Uh oh!

SalmanMohammadi May 19, 2025

Choose a reason for hiding this comment

Uh oh!

joecummings May 19, 2025

Choose a reason for hiding this comment

Uh oh!

SalmanMohammadi left a comment

Choose a reason for hiding this comment

Uh oh!

krammnic commented May 19, 2025

Uh oh!

Uh oh!

Uh oh!

New `OptimizerInBackward` #2719

New `OptimizerInBackward` #2719

joecummings commented May 12, 2025 •

edited

Loading

`OptimizerInBackward`

pytorch-bot bot commented May 12, 2025 •

edited

Loading