Releases: huggingface/trl
v0.9.3 RLOO / PPOv2 Trainer, RM Visualization
We are excited to introduce the new v0.9.3 release. Many new exciting features and algorithms. The highlights are as follows:
- RLOO Trainer: RLOO (Reinforce Leave-one-out) is a new online RL algorithm for RLHF, proposed by Ahmadian et al from Cohere. Check out our docs here to get started
- PPOv2 Trainer: We are introducing a new experimental PPOv2 trainer which is more aligned with OpenAI's PPO implementation based on https://arxiv.org/abs/2403.17031. Check out our docs here to get started
- Reward model visualization: the reward model training now includes visualization on the eval dataset, as shown below.
Screen.Recording.2024-05-09.at.2.37.44.PM.mov
- New losses in the DPO Trainer: DPOTrainer now includes losses / support for Self-play Preference Optimization, Robust DPO, TR-DPO, Iterative Reasoning Preference Optimization, and Pairwise Noise Contrastive Alignment
- New losses in the KTO Trainer: KTOTrainer now includes the loss for Binary Classifier Optimization (BCO)
What's Changed
- set dev version by @younesbelkada in #1568
- fix add_special_tokens issue for data with template by @edixiong in #1509
- [DPO] add 'bco_pair' loss_type by @seanexp in #1524
- [DPO] DPOConfig class by @kashif in #1554
- [SFT] add SFT Trainer Config dataclass by @kashif in #1530
- FIX: Fix CI on transformers main by @younesbelkada in #1576
- [
SFTTrainer
] Add warning in SFTTrainer when dataset already processed by @younesbelkada in #1577 - Fix typo detoxifying doc by @qgallouedec in #1594
- Core: removed unexisting
SftArgumentParser
by @younesbelkada in #1602 - [
KTOTrainer
] add BCO (reward shift and underlying distribution matching) by @seanexp in #1599 - [CLI] Use auto device map for model load by @lewtun in #1596
- Removing
tests/
from package data by @jamesbraza in #1607 - Docs: Fix build main documentation by @younesbelkada in #1604
- support loss function for Self-play Preference Optimization by @winglian in #1612
- Update HH dataset on helpful only subset by @vwxyzjn in #1613
- corrects loss function for Self-play Preference Optimization hard label version by @angelahzyuan in #1615
- Fix ZeRO-3 generation context manager by @lewtun in #1617
- fixed adding bos and eos token unconditionally by @jasonyux in #1591
- visualize rm prediction by @vwxyzjn in #1636
- [ORPO] Correct label mask for pad tokens by @IlyaGusev in #1625
- Update sft_llama2.py to work with the latest API by @xianbaoqian in #1637
- Fixed wrong logs prefixes in KTOTrainer by @bartoszzuk in #1641
- Pairwise Noise Contrastive Alignment by @winglian in #1632
- don't cast the trainable lora layers to half precision by @pacman100 in #1644
- PPO / Reinforce Trainers by @vwxyzjn in #1540
- Apply deprecated
evaluation_strategy
by @muellerzr in #1559 - FEAT: Add support for training collator in PPOTrainer by @younesbelkada in #1658
- Correct Documentation for cDPO Usage by @AliBakly in #1655
- Fix inheritance order in PPOv2Config by @Nicolinho in #1659
- [DPO] Add 'robust' loss_type by @Abilityguy in #1653
- 🤫 TR-DPO implementation by @syrn1k in #1593
- Do not upcast adapters when using FSDP+QLoRA by @pacman100 in #1654
- [Tests] update eval_strategy API by @kashif in #1662
- Fix ppov2 test case by @vwxyzjn in #1661
- FIX / PPO: Fix
enable_input_require_grads
issues with PPO models by @younesbelkada in #1664 - fix dataset load error by @sywangyi in #1670
- FIX / SFTTrainer: Fix SFTTrainer with
args=None
by @younesbelkada in #1678 - Fix max_completion_length for encoder_decoder models in KTO Trainer by @samuki in #1588
- intial RPO loss by @kashif in #1686
- Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig by @alexisrozhkov in #1690
- Skip packing validation by @alex-jw-brooks in #1673
- Fix typo in DPOTrainer's warnings by @qgallouedec in #1688
- Quick fix on GPT4-eval by @vwxyzjn in #1696
- Release 0.9.2 by @vwxyzjn in #1697
New Contributors
- @edixiong made their first contribution in #1509
- @seanexp made their first contribution in #1524
- @jamesbraza made their first contribution in #1607
- @winglian made their first contribution in #1612
- @angelahzyuan made their first contribution in #1615
- @jasonyux made their first contribution in #1591
- @IlyaGusev made their first contribution in #1625
- @xianbaoqian made their first contribution in #1637
- @bartoszzuk made their first contribution in #1641
- @muellerzr made their first contribution in #1559
- @AliBakly made their first contribution in #1655
- @Nicolinho made their first contribution in #1659
- @Abilityguy made their first contribution in #1653
- @syrn1k made their first contribution in #1593
- @alexisrozhkov made their first contribution in #1690
- @alex-jw-brooks made their first contribution in #1673
Full Changelog: v0.8.6...v0.9.2
v0.8.6: Fixes for CLI
What's Changed
- set dev version by @younesbelkada in #1556
- [CLI] Update init.py imports by @kashif in #1557
- CLI: Add warning when ignored params are passed + parse config file if config if passed by @younesbelkada in #1565
- Release: v0.8.6 by @younesbelkada in #1567
Full Changelog: v0.8.5...v0.8.6
v0.8.5: Important fixes for CLIs
What's Changed
- set dev version by @younesbelkada in #1548
- FIX: make the train / test fields modulable by @younesbelkada in #1551
- enable multiple eos tokens by @lvwerra in #1553
- Release: v0.8.5 by @younesbelkada in #1555
Full Changelog: v0.8.4...v0.8.5
v0.8.4: CLI / CPO / KTO important fixes
This patch release includes important fixes for the CLI and KTO & CPO trainers
What's Changed
- set dev version by @younesbelkada in #1529
- [CPO] fix memory leak due to retained value by @kashif in #1531
- VSFT hotfix - adds gen prompt to template and processor to hub by @edbeeching in #1532
- save_model -> save_pretrained in ppo_trainer.mdx by @ejmejm in #1537
- [KTO] support to load the adapter twice by @claralp in #1542
- CLI: Set
dataset_text_field
toNone
to allow ChatML automatic template by @younesbelkada in #1545 - FIX: Fix slow test by @younesbelkada in #1546
- Fixed ref model not used in PPO generation by @ejmejm in #1534
- Release: v0.8.4 by @younesbelkada in #1547
New Contributors
Full Changelog: v0.8.3...v0.8.4
v0.8.3: Patch release for CLI
What's Changed
This is a patch release that includes an import fix for CLIs
- set dev version by @younesbelkada in #1523
- [CLI] fix imports by @kashif in #1527
- Release: v0.8.3 by @younesbelkada in #1528
Full Changelog: v0.8.2...v0.8.3
v0.8.2: ORPO & CPO Trainer / Vision LLMs support for `SFTTrainer`, KTO fixes
ORPO Trainer & Vision LLMs support for SFTTrainer, KTO fixes
This release includes two new trainers: ORPO from KAIST and CPO
The release also includes Vision LLM such as Llava support for SFTTrainer
, please see: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details
ORPO Trainer
CPO Trainer
- Add CPOTrainer by @fe1ixxu in #1382
- Add
use_cache=False
in{ORPO,CPO}Trainer.concatenated_forward
by @alvarobartt in #1478 - [ORPO] Update NLL loss to use
input_ids
instead by @alvarobartt in #1516
VLLMs support for SFTTrainer
You can now use SFTTrainer
to fine-tune VLLMs such as Llava !
See: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details
- Adds VLM Training support to SFTTrainer + VSFT script by @edbeeching in #1518
KTO Fixes
Many fixes were introduced for the KTOTrainer:
- Update KTO example to use better model and ChatML support by @lewtun in #1485
- [KTO] Use batching to speed up data processing by @lewtun in #1470
- Update KTO example with good dataset & chat format by @lewtun in #1481
- [KTO] fix interleaving, reporting, and hanging bugs by @kawine and @claralp in #1499
- [KTO] fix metric logging by @claralp in #1514
10x PPO !
Other fixes
- set dev version by @younesbelkada in #1463
- Use the standard dataset for DPO CLI by @vwxyzjn in #1456
- [peft] Update test_reward_trainer.py to fix tests by @kashif in #1471
- Fix hyperparameters in KTO example by @lewtun in #1474
- docs: add missing Trainer classes and sort alphabetically by @anakin87 in #1479
- hackey update to ModelConfig to allow lora_target_modules="all-linear" by @galtay in #1488
- Ignore chat files by @lewtun in #1486
- Add DPO link in README by @qgallouedec in #1502
- Fix typo in how_to_train.md by @ftorres16 in #1503
- Fix DPO Unsloth example in Docs by @arnavgarg1 in #1494
- Correct ppo_epochs usage by @muhammed-shihebi in #1480
- Fix
RichProgressCallback
by @eggry in #1496 - Change the device index to device:index by @yuanwu2017 in #1490
- FIX: use kwargs for RMTrainer by @younesbelkada in #1515
- Allow streaming (datasets.IterableDataset) by @BramVanroy in #1468
- Allow pre-tokenized datasets in SFTTrainer by @BramVanroy in #1520
- [DOC] Add data description for sfttrainer doc by @BramVanroy in #1521
- Release: v0.8.2 by @younesbelkada in #1522
New Contributors
- @fe1ixxu made their first contribution in #1382
- @anakin87 made their first contribution in #1479
- @galtay made their first contribution in #1488
- @qgallouedec made their first contribution in #1502
- @ftorres16 made their first contribution in #1503
- @arnavgarg1 made their first contribution in #1494
- @muhammed-shihebi made their first contribution in #1480
- @eggry made their first contribution in #1496
- @claralp made their first contribution in #1514
Full Changelog: v0.8.1...v0.8.2
v0.8.1: Patch release for CLIs
This patch release includes some important fixes for CLIs
What's Changed
- set dev version by @younesbelkada in #1454
- Fix chat CLI for model revisions by @lewtun in #1458
- [chat] add eos token to generate by @lvwerra in #1459
- Release: v0.8.1 by @younesbelkada in #1462
Full Changelog: v0.8.0...v0.8.1
v0.8.0: KTOTrainer, TRL CLIs, QLoRA + FSDP !
New Trainer: KTOTrainer:
We recently introduced the KTOTrainer in order to run KTO algorithms on LLMs !
- fix bugs in KTO implementation by @kawine in #1380
- [KTO] merge eval dataset only if it exists by @kashif in #1383
- [KTO] prevent nans from appearing in metrics by @kawine in #1386
- Kto trainer by @kashif in #1181
- [KTO] fix tokenization bugs by @kawine in #1418
- [KTO] model init when args are given by @kashif in #1413
- [KTO] fix various bugs by @kawine in #1402
TRL Command Line Interfaces (CLIs):
Run SFT, DPO and chat with your aligned model directly from the terminal:
SFT:
trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb
DPO:
trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/Anthropic-hh-rlhf-processed --output_dir opt-sft-hh-rlhf
Chat:
trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat
Read more about CLI in the relevant documentation section or use --help
for more details.
- FEAT: Add CLIs in TRL ! by @younesbelkada in #1419
- CI / CLI: Properly raise error when CLI tests failed by @younesbelkada in #1446
- chat cli by @lvwerra in #1431
- Fix yaml parsing issue by @younesbelkada in #1450
model
-->model_name_or_path
by @lvwerra in #1452- FEAT: Update README to add DPO + CLIs by @younesbelkada in #1448
FSDP + QLoRA:
SFTTrainer now supports FSDP + QLoRA
- Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA by @pacman100 in #1416
Other fixes
- set dev version by @younesbelkada in #1332
- Update stack llama 2 example to reflect #aa35fec by @nautsimon in #1333
- FIX: More user friendly error when users don't have PEFT by @younesbelkada in #1350
- fix 8-bit multi-gpu training bug by @fancyerii in #1353
- set seed in sft/dpo/reward_modeling to make result reproducable by @sywangyi in #1357
- Fix transformers version checking for Python < 3.8 by @samuki in #1363
- Add some arguments for support XPU by @yuanwu2017 in #1366
- ENH: Send Docker and transformers main CI results on slack after merging on main by @younesbelkada in #1370
- FEAT: [
SFTTrainer
] Addeval_packing
by @younesbelkada in #1369 - FEAT:
force_use_ref_model
for power users by @younesbelkada in #1367 - FIX: fix after #1370 by @younesbelkada in #1372
- FIX: Change ci to fail-fast=False by @younesbelkada in #1373
- FIX: Fix the CI again .. by @younesbelkada in #1374
- Log ddpo reward as float to fix numpy conversion during bf16 training by @skavulya in #1391
- Fix the pad_token_id error by @yuanwu2017 in #1394
- FIX [
RewardModeling
] Fix RM script for PEFT by @younesbelkada in #1393 - Fix import error from deprecation in transformers by @lewtun in #1415
- CI: Fix CI on main by @younesbelkada in #1422
- [Kto] torch_dtype kwargs fix by @kashif in #1429
- Create standard dataset for TRL by @vwxyzjn in #1424
- FIX: fix doc build on main by @younesbelkada in #1437
- Fix PPOTrainer README example by @nikihowe in #1441
- Before update the tr_loss, make sure tr_loss_step is in the same device. by @pengwei715 in #1439
- Release: v0.8.0 by @younesbelkada in #1453
New Contributors
- @nautsimon made their first contribution in #1333
- @fancyerii made their first contribution in #1353
- @samuki made their first contribution in #1363
- @yuanwu2017 made their first contribution in #1366
- @kawine made their first contribution in #1380
- @skavulya made their first contribution in #1391
- @pengwei715 made their first contribution in #1439
Full Changelog: v0.7.11...v0.8.0
v0.7.11: IPO & DPO fixes, faster data processing for multi-GPU, Automatic tagging for all models
DPO important fixes
We fixed issues with respect to IPO loss, leading to consistent results according to newest experiements:
We also fixed important bugs with respect to DPO / PEFT and Flash Attention
- [
DPOTrainer
] Fix DPO trainer + mistral + FA2 by @younesbelkada in #1290
Data processing is now faster for multi-GPU envs
- [
DPOTrainer
] Load data only on main process + fix dpo example test by @younesbelkada in #1291 - Add multiprocessing in the DPO trainer. by @imraviagrawal in #1286
Other DPO bugfixes:
- [
PEFT
+DPO
] Raise value error if one passes a ref_model and a peft_config by @younesbelkada in #1289 - Fix wrong variable name in DPOTrainer documentation example by @ouhenio in #1280
- fix padding in dpo trainer by @pacman100 in #1284
- Fix AttributeError in dpo_trainer for reference_free case in dpo_loss function by @maliozer in #1313
- [DPOTrainer] Add multiprocessing for the eval_dataset map by @esceptico in #1307
Faster data processing and other enhancements:
- Only load data on main process by @JohnGiorgi in #1255
- Remove tyro by @vwxyzjn in #1176
Automatic tagging for all models
Models now gets tagged correctly even if users do not call trainer.push_to_hub()
- [
core
/xxxTrainer
] Automatic tagging by @younesbelkada in #1329
What's Changed
- set dev version by @younesbelkada in #1254
- Update Model Generation config to reflect new special tokens by @philschmid in #1256
- Fix a typo in variable name by @otlaitil in #1269
- FIx SFTTrainer bugs on TRL main by @younesbelkada in #1276
- Fix SFT tuner in CI by @vwxyzjn in #1278
- Fix sft ci by @vwxyzjn in #1279
- Fix DPO slow tests by @younesbelkada in #1292
- Fix sft trainer when args is None by @younesbelkada in #1295
- Fix
DPOTrainer
docstrings by @alvarobartt in #1298 - Types: Fix PEP 484 implicit-optional compliance by @akx in #1297
- Update sft_trainer.mdx to add note on launching DDP training by @johnowhitaker in #1308
- Codemod Unittest assertions to bare asserts by @akx in #1301
- ENH: Run CI only if relevant files are modified by @younesbelkada in #1309
- Fix typos in docs for Multi Adapter RL (MARL). by @elhusseiniali in #1312
- Fix doc snippet PPOTrainer argument train_dataset -> dataset by @j-cb in #1321
- Best practice recommendation update for dpo_trainer.mdx by @R-seny in #1325
- pre-commit: replace linters + formatters with Ruff; fix some issues by @akx in #1300
- Update README.md to clarify model requirement by @markstur in #1315
- [
core
/DDPO
] Fix diffusers import issue by @younesbelkada in #1314 - [
CI
] Add tests on transformers peft main on push main by @younesbelkada in #1328 - Release: v0.7.11 by @younesbelkada in #1331
New Contributors
- @otlaitil made their first contribution in #1269
- @JohnGiorgi made their first contribution in #1255
- @ouhenio made their first contribution in #1280
- @imraviagrawal made their first contribution in #1286
- @akx made their first contribution in #1297
- @esceptico made their first contribution in #1307
- @johnowhitaker made their first contribution in #1308
- @elhusseiniali made their first contribution in #1312
- @maliozer made their first contribution in #1313
- @j-cb made their first contribution in #1321
- @R-seny made their first contribution in #1325
- @markstur made their first contribution in #1315
Full Changelog: v0.7.10...v0.7.11
v0.7.10: Automatic templating, `setup_chat_format` API, stronger tests
v0.7.10: Minor fixes, Automatic templating, setup_chat_format
API, stronger tests
This Patch release adds a new feature in TRL for dealing with chat datasets - you can load a directly formatted dataset without the need of formatting it beforehand.
Read more about it here: https://huggingface.co/docs/trl/sft_trainer#dataset-format-support
The release also introduces a new API setup_chat_format
to correctly resize the model embeddings with the target size when adding new tokens to comply with the chat format. Currently we only support chatml
format and we can add more formats in the future
Read more about it here: https://huggingface.co/docs/trl/sft_trainer#add-special-tokens-for-chat-format
We also extensively test SFTTrainer and DPOTrainer and the example scripts, dpo.py
and sft.py
should be well -battletested. If you see any issue with the script, please let us know on GitHub.
What's Changed
- set dev version by @younesbelkada in #1207
- Check tokenize params on DPOTrainer by @pablovicente in #1197
- Fix shape descriptions in calculate_loss method by @yuta0x89 in #1204
- Fix FSDP error by @mgerstgrasser in #1196
- Update Unsloth SFT, DPO docs by @danielhanchen in #1213
- Fix args type by @zspo in #1214
- [
core
/Docker
] Add workflow to build TRL docker images by @younesbelkada in #1215 - Refactor RewardConfig to own module by @lewtun in #1221
- Add support for ChatML dataset format in by @philschmid in #1208
- Add slow test workflow file by @younesbelkada in #1223
- Remove a repeating line in how_to_train.md by @kykim0 in #1226
- Logs metrics on all distributed processes when using DPO & FSDP by @AjayP13 in #1160
- fix: improve error message when
pad_token_id
is not configured by @yumemio in #1152 - [
core
/ tests ] v1 slow tests by @younesbelkada in #1218 - [
core
/ SFTTrainer] Fix breaking change by @younesbelkada in #1229 - Fixes slow tests by @younesbelkada in #1241
- Fix weird doc bug by @younesbelkada in #1244
- Add
setup_chat_format
for adding new special tokens to model for training chat models by @philschmid in #1242 - Fix chatml template by @philschmid in #1248
- fix: fix loss_type and some args desc by @zspo in #1247
- Release: v0.7.10 by @younesbelkada in #1253
New Contributors
- @yuta0x89 made their first contribution in #1204
- @danielhanchen made their first contribution in #1213
- @zspo made their first contribution in #1214
- @philschmid made their first contribution in #1208
- @kykim0 made their first contribution in #1226
- @AjayP13 made their first contribution in #1160
- @yumemio made their first contribution in #1152
Full Changelog: v0.7.9...v0.7.10