Skip to content

Releases: huggingface/trl

v0.9.3 RLOO / PPOv2 Trainer, RM Visualization

05 Jun 16:08
c0819ee
Compare
Choose a tag to compare

We are excited to introduce the new v0.9.3 release. Many new exciting features and algorithms. The highlights are as follows:

  1. RLOO Trainer: RLOO (Reinforce Leave-one-out) is a new online RL algorithm for RLHF, proposed by Ahmadian et al from Cohere. Check out our docs here to get started
  2. PPOv2 Trainer: We are introducing a new experimental PPOv2 trainer which is more aligned with OpenAI's PPO implementation based on https://arxiv.org/abs/2403.17031. Check out our docs here to get started
  3. Reward model visualization: the reward model training now includes visualization on the eval dataset, as shown below.
Screen.Recording.2024-05-09.at.2.37.44.PM.mov
  1. New losses in the DPO Trainer: DPOTrainer now includes losses / support for Self-play Preference Optimization, Robust DPO, TR-DPO, Iterative Reasoning Preference Optimization, and Pairwise Noise Contrastive Alignment
  2. New losses in the KTO Trainer: KTOTrainer now includes the loss for Binary Classifier Optimization (BCO)

What's Changed

New Contributors

Full Changelog: v0.8.6...v0.9.2

v0.8.6: Fixes for CLI

22 Apr 08:59
e90e8d9
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.8.5...v0.8.6

v0.8.5: Important fixes for CLIs

18 Apr 11:58
3595eb0
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.8.4...v0.8.5

v0.8.4: CLI / CPO / KTO important fixes

17 Apr 15:22
a5788ac
Compare
Choose a tag to compare

This patch release includes important fixes for the CLI and KTO & CPO trainers

What's Changed

New Contributors

Full Changelog: v0.8.3...v0.8.4

v0.8.3: Patch release for CLI

12 Apr 10:25
9822647
Compare
Choose a tag to compare

What's Changed

This is a patch release that includes an import fix for CLIs

Full Changelog: v0.8.2...v0.8.3

v0.8.2: ORPO & CPO Trainer / Vision LLMs support for `SFTTrainer`, KTO fixes

11 Apr 13:51
143e111
Compare
Choose a tag to compare

ORPO Trainer & Vision LLMs support for SFTTrainer, KTO fixes

This release includes two new trainers: ORPO from KAIST and CPO
The release also includes Vision LLM such as Llava support for SFTTrainer, please see: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

ORPO Trainer

CPO Trainer

VLLMs support for SFTTrainer

You can now use SFTTrainer to fine-tune VLLMs such as Llava !
See: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

KTO Fixes

Many fixes were introduced for the KTOTrainer:

  • Update KTO example to use better model and ChatML support by @lewtun in #1485
  • [KTO] Use batching to speed up data processing by @lewtun in #1470
  • Update KTO example with good dataset & chat format by @lewtun in #1481
  • [KTO] fix interleaving, reporting, and hanging bugs by @kawine and @claralp in #1499
  • [KTO] fix metric logging by @claralp in #1514

10x PPO !

Other fixes

New Contributors

Full Changelog: v0.8.1...v0.8.2

v0.8.1: Patch release for CLIs

20 Mar 10:39
8534f0e
Compare
Choose a tag to compare

This patch release includes some important fixes for CLIs

What's Changed

Full Changelog: v0.8.0...v0.8.1

v0.8.0: KTOTrainer, TRL CLIs, QLoRA + FSDP !

19 Mar 16:25
f2c7177
Compare
Choose a tag to compare

New Trainer: KTOTrainer:

We recently introduced the KTOTrainer in order to run KTO algorithms on LLMs !

TRL Command Line Interfaces (CLIs):

Run SFT, DPO and chat with your aligned model directly from the terminal:

SFT:

trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb

DPO:

trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/Anthropic-hh-rlhf-processed --output_dir opt-sft-hh-rlhf 

Chat:

trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat

Read more about CLI in the relevant documentation section or use --help for more details.

FSDP + QLoRA:

SFTTrainer now supports FSDP + QLoRA

  • Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA by @pacman100 in #1416

Other fixes

New Contributors

Full Changelog: v0.7.11...v0.8.0

v0.7.11: IPO & DPO fixes, faster data processing for multi-GPU, Automatic tagging for all models

16 Feb 08:22
0f13e51
Compare
Choose a tag to compare

DPO important fixes

We fixed issues with respect to IPO loss, leading to consistent results according to newest experiements:

  • [DPO] average_log_prob when loss is IPO by @kashif in #1265

We also fixed important bugs with respect to DPO / PEFT and Flash Attention

Data processing is now faster for multi-GPU envs

Other DPO bugfixes:

  • [PEFT + DPO] Raise value error if one passes a ref_model and a peft_config by @younesbelkada in #1289
  • Fix wrong variable name in DPOTrainer documentation example by @ouhenio in #1280
  • fix padding in dpo trainer by @pacman100 in #1284
  • Fix AttributeError in dpo_trainer for reference_free case in dpo_loss function by @maliozer in #1313
  • [DPOTrainer] Add multiprocessing for the eval_dataset map by @esceptico in #1307

Faster data processing and other enhancements:

Automatic tagging for all models

Models now gets tagged correctly even if users do not call trainer.push_to_hub()

What's Changed

New Contributors

Full Changelog: v0.7.10...v0.7.11

v0.7.10: Automatic templating, `setup_chat_format` API, stronger tests

19 Jan 10:58
09ca760
Compare
Choose a tag to compare

v0.7.10: Minor fixes, Automatic templating, setup_chat_format API, stronger tests

This Patch release adds a new feature in TRL for dealing with chat datasets - you can load a directly formatted dataset without the need of formatting it beforehand.

Read more about it here: https://huggingface.co/docs/trl/sft_trainer#dataset-format-support

The release also introduces a new API setup_chat_format to correctly resize the model embeddings with the target size when adding new tokens to comply with the chat format. Currently we only support chatml format and we can add more formats in the future

Read more about it here: https://huggingface.co/docs/trl/sft_trainer#add-special-tokens-for-chat-format

We also extensively test SFTTrainer and DPOTrainer and the example scripts, dpo.py and sft.py should be well -battletested. If you see any issue with the script, please let us know on GitHub.

What's Changed

New Contributors

Full Changelog: v0.7.9...v0.7.10