05 Jun 16:08

vwxyzjn

c0819ee

v0.9.3 RLOO / PPOv2 Trainer, RM Visualization

We are excited to introduce the new v0.9.3 release. Many new exciting features and algorithms. The highlights are as follows:

RLOO Trainer: RLOO (Reinforce Leave-one-out) is a new online RL algorithm for RLHF, proposed by Ahmadian et al from Cohere. Check out our docs here to get started
PPOv2 Trainer: We are introducing a new experimental PPOv2 trainer which is more aligned with OpenAI's PPO implementation based on https://arxiv.org/abs/2403.17031. Check out our docs here to get started
Reward model visualization: the reward model training now includes visualization on the eval dataset, as shown below.

Screen.Recording.2024-05-09.at.2.37.44.PM.mov

New losses in the DPO Trainer: DPOTrainer now includes losses / support for Self-play Preference Optimization, Robust DPO, TR-DPO, Iterative Reasoning Preference Optimization, and Pairwise Noise Contrastive Alignment
New losses in the KTO Trainer: KTOTrainer now includes the loss for Binary Classifier Optimization (BCO)

What's Changed

set dev version by @younesbelkada in #1568
fix add_special_tokens issue for data with template by @edixiong in #1509
[DPO] add 'bco_pair' loss_type by @seanexp in #1524
[DPO] DPOConfig class by @kashif in #1554
[SFT] add SFT Trainer Config dataclass by @kashif in #1530
FIX: Fix CI on transformers main by @younesbelkada in #1576
[SFTTrainer] Add warning in SFTTrainer when dataset already processed by @younesbelkada in #1577
Fix typo detoxifying doc by @qgallouedec in #1594
Core: removed unexisting SftArgumentParser by @younesbelkada in #1602
[KTOTrainer] add BCO (reward shift and underlying distribution matching) by @seanexp in #1599
[CLI] Use auto device map for model load by @lewtun in #1596
Removing tests/ from package data by @jamesbraza in #1607
Docs: Fix build main documentation by @younesbelkada in #1604
support loss function for Self-play Preference Optimization by @winglian in #1612
Update HH dataset on helpful only subset by @vwxyzjn in #1613
corrects loss function for Self-play Preference Optimization hard label version by @angelahzyuan in #1615
Fix ZeRO-3 generation context manager by @lewtun in #1617
fixed adding bos and eos token unconditionally by @jasonyux in #1591
visualize rm prediction by @vwxyzjn in #1636
[ORPO] Correct label mask for pad tokens by @IlyaGusev in #1625
Update sft_llama2.py to work with the latest API by @xianbaoqian in #1637
Fixed wrong logs prefixes in KTOTrainer by @bartoszzuk in #1641
Pairwise Noise Contrastive Alignment by @winglian in #1632
don't cast the trainable lora layers to half precision by @pacman100 in #1644
PPO / Reinforce Trainers by @vwxyzjn in #1540
Apply deprecated evaluation_strategy by @muellerzr in #1559
FEAT: Add support for training collator in PPOTrainer by @younesbelkada in #1658
Correct Documentation for cDPO Usage by @AliBakly in #1655
Fix inheritance order in PPOv2Config by @Nicolinho in #1659
[DPO] Add 'robust' loss_type by @Abilityguy in #1653
🤫 TR-DPO implementation by @syrn1k in #1593
Do not upcast adapters when using FSDP+QLoRA by @pacman100 in #1654
[Tests] update eval_strategy API by @kashif in #1662
Fix ppov2 test case by @vwxyzjn in #1661
FIX / PPO: Fix enable_input_require_grads issues with PPO models by @younesbelkada in #1664
fix dataset load error by @sywangyi in #1670
FIX / SFTTrainer: Fix SFTTrainer with args=None by @younesbelkada in #1678
Fix max_completion_length for encoder_decoder models in KTO Trainer by @samuki in #1588
intial RPO loss by @kashif in #1686
Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig by @alexisrozhkov in #1690
Skip packing validation by @alex-jw-brooks in #1673
Fix typo in DPOTrainer's warnings by @qgallouedec in #1688
Quick fix on GPT4-eval by @vwxyzjn in #1696
Release 0.9.2 by @vwxyzjn in #1697

New Contributors

@edixiong made their first contribution in #1509
@seanexp made their first contribution in #1524
@jamesbraza made their first contribution in #1607
@winglian made their first contribution in #1612
@angelahzyuan made their first contribution in #1615
@jasonyux made their first contribution in #1591
@IlyaGusev made their first contribution in #1625
@xianbaoqian made their first contribution in #1637
@bartoszzuk made their first contribution in #1641
@muellerzr made their first contribution in #1559
@AliBakly made their first contribution in #1655
@Nicolinho made their first contribution in #1659
@Abilityguy made their first contribution in #1653
@syrn1k made their first contribution in #1593
@alexisrozhkov made their first contribution in #1690
@alex-jw-brooks made their first contribution in #1673

Full Changelog: v0.8.6...v0.9.2

Contributors

kashif, winglian, and 22 other contributors

Assets 2

22 Apr 08:59

younesbelkada

v0.8.6

e90e8d9

v0.8.6: Fixes for CLI

What's Changed

set dev version by @younesbelkada in #1556
[CLI] Update init.py imports by @kashif in #1557
CLI: Add warning when ignored params are passed + parse config file if config if passed by @younesbelkada in #1565
Release: v0.8.6 by @younesbelkada in #1567

Full Changelog: v0.8.5...v0.8.6

Contributors

kashif and younesbelkada

Assets 2

18 Apr 11:58

younesbelkada

v0.8.5

3595eb0

v0.8.5: Important fixes for CLIs

What's Changed

set dev version by @younesbelkada in #1548
FIX: make the train / test fields modulable by @younesbelkada in #1551
enable multiple eos tokens by @lvwerra in #1553
Release: v0.8.5 by @younesbelkada in #1555

Full Changelog: v0.8.4...v0.8.5

Contributors

lvwerra and younesbelkada

Assets 2

17 Apr 15:22

younesbelkada

v0.8.4

a5788ac

v0.8.4: CLI / CPO / KTO important fixes

This patch release includes important fixes for the CLI and KTO & CPO trainers

What's Changed

set dev version by @younesbelkada in #1529
[CPO] fix memory leak due to retained value by @kashif in #1531
VSFT hotfix - adds gen prompt to template and processor to hub by @edbeeching in #1532
save_model -> save_pretrained in ppo_trainer.mdx by @ejmejm in #1537
[KTO] support to load the adapter twice by @claralp in #1542
CLI: Set dataset_text_field to None to allow ChatML automatic template by @younesbelkada in #1545
FIX: Fix slow test by @younesbelkada in #1546
Fixed ref model not used in PPO generation by @ejmejm in #1534
Release: v0.8.4 by @younesbelkada in #1547

New Contributors

@ejmejm made their first contribution in #1537

Full Changelog: v0.8.3...v0.8.4

Contributors

kashif, ejmejm, and 3 other contributors

Assets 2

12 Apr 10:25

younesbelkada

v0.8.3

9822647

v0.8.3: Patch release for CLI

What's Changed

This is a patch release that includes an import fix for CLIs

set dev version by @younesbelkada in #1523
[CLI] fix imports by @kashif in #1527
Release: v0.8.3 by @younesbelkada in #1528

Full Changelog: v0.8.2...v0.8.3

Contributors

kashif and younesbelkada

Assets 2

11 Apr 13:51

younesbelkada

v0.8.2

143e111

v0.8.2: ORPO & CPO Trainer / Vision LLMs support for `SFTTrainer`, KTO fixes

ORPO Trainer & Vision LLMs support for SFTTrainer, KTO fixes

This release includes two new trainers: ORPO from KAIST and CPO
The release also includes Vision LLM such as Llava support for SFTTrainer, please see: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

ORPO Trainer

ORPO trainer by @kashif in #1435
[ORPO] use log1p for loss by @kashif in #1491

CPO Trainer

Add CPOTrainer by @fe1ixxu in #1382
Add use_cache=False in {ORPO,CPO}Trainer.concatenated_forward by @alvarobartt in #1478
[ORPO] Update NLL loss to use input_ids instead by @alvarobartt in #1516

VLLMs support for SFTTrainer

You can now use SFTTrainer to fine-tune VLLMs such as Llava !
See: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

Adds VLM Training support to SFTTrainer + VSFT script by @edbeeching in #1518

KTO Fixes

Many fixes were introduced for the KTOTrainer:

Update KTO example to use better model and ChatML support by @lewtun in #1485
[KTO] Use batching to speed up data processing by @lewtun in #1470
Update KTO example with good dataset & chat format by @lewtun in #1481
[KTO] fix interleaving, reporting, and hanging bugs by @kawine and @claralp in #1499
[KTO] fix metric logging by @claralp in #1514

10x PPO !

Speed up PPO with ZeRO-3 by 10x 🔥 by @lewtun in #1483

Other fixes

set dev version by @younesbelkada in #1463
Use the standard dataset for DPO CLI by @vwxyzjn in #1456
[peft] Update test_reward_trainer.py to fix tests by @kashif in #1471
Fix hyperparameters in KTO example by @lewtun in #1474
docs: add missing Trainer classes and sort alphabetically by @anakin87 in #1479
hackey update to ModelConfig to allow lora_target_modules="all-linear" by @galtay in #1488
Ignore chat files by @lewtun in #1486
Add DPO link in README by @qgallouedec in #1502
Fix typo in how_to_train.md by @ftorres16 in #1503
Fix DPO Unsloth example in Docs by @arnavgarg1 in #1494
Correct ppo_epochs usage by @muhammed-shihebi in #1480
Fix RichProgressCallback by @eggry in #1496
Change the device index to device:index by @yuanwu2017 in #1490
FIX: use kwargs for RMTrainer by @younesbelkada in #1515
Allow streaming (datasets.IterableDataset) by @BramVanroy in #1468
Allow pre-tokenized datasets in SFTTrainer by @BramVanroy in #1520
[DOC] Add data description for sfttrainer doc by @BramVanroy in #1521
Release: v0.8.2 by @younesbelkada in #1522

New Contributors

@fe1ixxu made their first contribution in #1382
@anakin87 made their first contribution in #1479
@galtay made their first contribution in #1488
@qgallouedec made their first contribution in #1502
@ftorres16 made their first contribution in #1503
@arnavgarg1 made their first contribution in #1494
@muhammed-shihebi made their first contribution in #1480
@eggry made their first contribution in #1496
@claralp made their first contribution in #1514

Full Changelog: v0.8.1...v0.8.2

Contributors

kashif, galtay, and 16 other contributors

Assets 2

20 Mar 10:39

younesbelkada

v0.8.1

8534f0e

v0.8.1: Patch release for CLIs

This patch release includes some important fixes for CLIs

What's Changed

set dev version by @younesbelkada in #1454
Fix chat CLI for model revisions by @lewtun in #1458
[chat] add eos token to generate by @lvwerra in #1459
Release: v0.8.1 by @younesbelkada in #1462

Full Changelog: v0.8.0...v0.8.1

Contributors

lvwerra, lewtun, and younesbelkada

Assets 2

19 Mar 16:25

younesbelkada

v0.8.0

f2c7177

v0.8.0: KTOTrainer, TRL CLIs, QLoRA + FSDP !

New Trainer: KTOTrainer:

We recently introduced the KTOTrainer in order to run KTO algorithms on LLMs !

fix bugs in KTO implementation by @kawine in #1380
[KTO] merge eval dataset only if it exists by @kashif in #1383
[KTO] prevent nans from appearing in metrics by @kawine in #1386
Kto trainer by @kashif in #1181
[KTO] fix tokenization bugs by @kawine in #1418
[KTO] model init when args are given by @kashif in #1413
[KTO] fix various bugs by @kawine in #1402

TRL Command Line Interfaces (CLIs):

Run SFT, DPO and chat with your aligned model directly from the terminal:

SFT:

trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb

DPO:

trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/Anthropic-hh-rlhf-processed --output_dir opt-sft-hh-rlhf

Chat:

trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat

Read more about CLI in the relevant documentation section or use --help for more details.

FEAT: Add CLIs in TRL ! by @younesbelkada in #1419
CI / CLI: Properly raise error when CLI tests failed by @younesbelkada in #1446
chat cli by @lvwerra in #1431
Fix yaml parsing issue by @younesbelkada in #1450
model --> model_name_or_path by @lvwerra in #1452
FEAT: Update README to add DPO + CLIs by @younesbelkada in #1448

FSDP + QLoRA:

SFTTrainer now supports FSDP + QLoRA

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA by @pacman100 in #1416

Other fixes

set dev version by @younesbelkada in #1332
Update stack llama 2 example to reflect #aa35fec by @nautsimon in #1333
FIX: More user friendly error when users don't have PEFT by @younesbelkada in #1350
fix 8-bit multi-gpu training bug by @fancyerii in #1353
set seed in sft/dpo/reward_modeling to make result reproducable by @sywangyi in #1357
Fix transformers version checking for Python < 3.8 by @samuki in #1363
Add some arguments for support XPU by @yuanwu2017 in #1366
ENH: Send Docker and transformers main CI results on slack after merging on main by @younesbelkada in #1370
FEAT: [SFTTrainer] Add eval_packing by @younesbelkada in #1369
FEAT: force_use_ref_model for power users by @younesbelkada in #1367
FIX: fix after #1370 by @younesbelkada in #1372
FIX: Change ci to fail-fast=False by @younesbelkada in #1373
FIX: Fix the CI again .. by @younesbelkada in #1374
Log ddpo reward as float to fix numpy conversion during bf16 training by @skavulya in #1391
Fix the pad_token_id error by @yuanwu2017 in #1394
FIX [RewardModeling] Fix RM script for PEFT by @younesbelkada in #1393
Fix import error from deprecation in transformers by @lewtun in #1415
CI: Fix CI on main by @younesbelkada in #1422
[Kto] torch_dtype kwargs fix by @kashif in #1429
Create standard dataset for TRL by @vwxyzjn in #1424
FIX: fix doc build on main by @younesbelkada in #1437
Fix PPOTrainer README example by @nikihowe in #1441
Before update the tr_loss, make sure tr_loss_step is in the same device. by @pengwei715 in #1439
Release: v0.8.0 by @younesbelkada in #1453

New Contributors

@nautsimon made their first contribution in #1333
@fancyerii made their first contribution in #1353
@samuki made their first contribution in #1363
@yuanwu2017 made their first contribution in #1366
@kawine made their first contribution in #1380
@skavulya made their first contribution in #1391
@pengwei715 made their first contribution in #1439

Full Changelog: v0.7.11...v0.8.0

Contributors

kashif, fancyerii, and 13 other contributors

Assets 2

16 Feb 08:22

younesbelkada

v0.7.11

0f13e51

v0.7.11: IPO & DPO fixes, faster data processing for multi-GPU, Automatic tagging for all models

DPO important fixes

We fixed issues with respect to IPO loss, leading to consistent results according to newest experiements:

[DPO] average_log_prob when loss is IPO by @kashif in #1265

We also fixed important bugs with respect to DPO / PEFT and Flash Attention

[DPOTrainer] Fix DPO trainer + mistral + FA2 by @younesbelkada in #1290

Data processing is now faster for multi-GPU envs

[DPOTrainer] Load data only on main process + fix dpo example test by @younesbelkada in #1291
Add multiprocessing in the DPO trainer. by @imraviagrawal in #1286

Other DPO bugfixes:

[PEFT + DPO] Raise value error if one passes a ref_model and a peft_config by @younesbelkada in #1289
Fix wrong variable name in DPOTrainer documentation example by @ouhenio in #1280
fix padding in dpo trainer by @pacman100 in #1284
Fix AttributeError in dpo_trainer for reference_free case in dpo_loss function by @maliozer in #1313
[DPOTrainer] Add multiprocessing for the eval_dataset map by @esceptico in #1307

Faster data processing and other enhancements:

Only load data on main process by @JohnGiorgi in #1255
Remove tyro by @vwxyzjn in #1176

Automatic tagging for all models

Models now gets tagged correctly even if users do not call trainer.push_to_hub()

[core / xxxTrainer] Automatic tagging by @younesbelkada in #1329

What's Changed

set dev version by @younesbelkada in #1254
Update Model Generation config to reflect new special tokens by @philschmid in #1256
Fix a typo in variable name by @otlaitil in #1269
FIx SFTTrainer bugs on TRL main by @younesbelkada in #1276
Fix SFT tuner in CI by @vwxyzjn in #1278
Fix sft ci by @vwxyzjn in #1279
Fix DPO slow tests by @younesbelkada in #1292
Fix sft trainer when args is None by @younesbelkada in #1295
Fix DPOTrainer docstrings by @alvarobartt in #1298
Types: Fix PEP 484 implicit-optional compliance by @akx in #1297
Update sft_trainer.mdx to add note on launching DDP training by @johnowhitaker in #1308
Codemod Unittest assertions to bare asserts by @akx in #1301
ENH: Run CI only if relevant files are modified by @younesbelkada in #1309
Fix typos in docs for Multi Adapter RL (MARL). by @elhusseiniali in #1312
Fix doc snippet PPOTrainer argument train_dataset -> dataset by @j-cb in #1321
Best practice recommendation update for dpo_trainer.mdx by @R-seny in #1325
pre-commit: replace linters + formatters with Ruff; fix some issues by @akx in #1300
Update README.md to clarify model requirement by @markstur in #1315
[core / DDPO] Fix diffusers import issue by @younesbelkada in #1314
[CI] Add tests on transformers peft main on push main by @younesbelkada in #1328
Release: v0.7.11 by @younesbelkada in #1331

New Contributors

@otlaitil made their first contribution in #1269
@JohnGiorgi made their first contribution in #1255
@ouhenio made their first contribution in #1280
@imraviagrawal made their first contribution in #1286
@akx made their first contribution in #1297
@esceptico made their first contribution in #1307
@johnowhitaker made their first contribution in #1308
@elhusseiniali made their first contribution in #1312
@maliozer made their first contribution in #1313
@j-cb made their first contribution in #1321
@R-seny made their first contribution in #1325
@markstur made their first contribution in #1315

Full Changelog: v0.7.10...v0.7.11

Contributors

kashif, akx, and 16 other contributors

Assets 2

19 Jan 10:58

younesbelkada

v0.7.10

09ca760

v0.7.10: Automatic templating, `setup_chat_format` API, stronger tests

v0.7.10: Minor fixes, Automatic templating, `setup_chat_format` API, stronger tests

This Patch release adds a new feature in TRL for dealing with chat datasets - you can load a directly formatted dataset without the need of formatting it beforehand.

The release also introduces a new API setup_chat_format to correctly resize the model embeddings with the target size when adding new tokens to comply with the chat format. Currently we only support chatml format and we can add more formats in the future

We also extensively test SFTTrainer and DPOTrainer and the example scripts, dpo.py and sft.py should be well -battletested. If you see any issue with the script, please let us know on GitHub.

What's Changed

set dev version by @younesbelkada in #1207
Check tokenize params on DPOTrainer by @pablovicente in #1197
Fix shape descriptions in calculate_loss method by @yuta0x89 in #1204
Fix FSDP error by @mgerstgrasser in #1196
Update Unsloth SFT, DPO docs by @danielhanchen in #1213
Fix args type by @zspo in #1214
[core / Docker] Add workflow to build TRL docker images by @younesbelkada in #1215
Refactor RewardConfig to own module by @lewtun in #1221
Add support for ChatML dataset format in by @philschmid in #1208
Add slow test workflow file by @younesbelkada in #1223
Remove a repeating line in how_to_train.md by @kykim0 in #1226
Logs metrics on all distributed processes when using DPO & FSDP by @AjayP13 in #1160
fix: improve error message when pad_token_id is not configured by @yumemio in #1152
[core / tests ] v1 slow tests by @younesbelkada in #1218
[core / SFTTrainer] Fix breaking change by @younesbelkada in #1229
Fixes slow tests by @younesbelkada in #1241
Fix weird doc bug by @younesbelkada in #1244
Add setup_chat_format for adding new special tokens to model for training chat models by @philschmid in #1242
Fix chatml template by @philschmid in #1248
fix: fix loss_type and some args desc by @zspo in #1247
Release: v0.7.10 by @younesbelkada in #1253

New Contributors

@yuta0x89 made their first contribution in #1204
@danielhanchen made their first contribution in #1213
@zspo made their first contribution in #1214
@philschmid made their first contribution in #1208
@kykim0 made their first contribution in #1226
@AjayP13 made their first contribution in #1160
@yumemio made their first contribution in #1152

Full Changelog: v0.7.9...v0.7.10

Contributors

kykim0, AjayP13, and 9 other contributors

Assets 2

Releases: huggingface/trl

v0.9.3 RLOO / PPOv2 Trainer, RM Visualization

What's Changed

New Contributors

Contributors

v0.8.6: Fixes for CLI

What's Changed

Contributors

v0.8.5: Important fixes for CLIs

What's Changed

Contributors

v0.8.4: CLI / CPO / KTO important fixes

What's Changed

New Contributors

Contributors

v0.8.3: Patch release for CLI

What's Changed

Contributors

v0.8.2: ORPO & CPO Trainer / Vision LLMs support for `SFTTrainer`, KTO fixes

ORPO Trainer & Vision LLMs support for SFTTrainer, KTO fixes

ORPO Trainer

CPO Trainer

VLLMs support for SFTTrainer

KTO Fixes

10x PPO !

Other fixes

New Contributors

Contributors

v0.8.1: Patch release for CLIs

What's Changed

Contributors

v0.8.0: KTOTrainer, TRL CLIs, QLoRA + FSDP !

New Trainer: KTOTrainer:

TRL Command Line Interfaces (CLIs):

FSDP + QLoRA:

Other fixes

New Contributors

Contributors

v0.7.11: IPO & DPO fixes, faster data processing for multi-GPU, Automatic tagging for all models

DPO important fixes

Faster data processing and other enhancements:

Automatic tagging for all models

What's Changed

New Contributors

Contributors

v0.7.10: Automatic templating, `setup_chat_format` API, stronger tests

v0.7.10: Minor fixes, Automatic templating, setup_chat_format API, stronger tests

What's Changed

New Contributors

Contributors

v0.7.10: Minor fixes, Automatic templating, `setup_chat_format` API, stronger tests