added initial TPO implementation #1965

sahsaeedi · 2024-08-24T20:23:27Z

What does this PR do?

This PR adds initial TPO (Triple Preference Optimization) implementation.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case. ( Please add TPO trainer to the trl #1901 )
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@qgallouedec

HuggingFaceDocBuilderDev · 2024-08-26T15:39:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sahsaeedi · 2024-09-03T19:05:01Z

Hi @qgallouedec,

I fixed the issues. I would appreciate it if you review the PR.

qgallouedec · 2024-09-17T07:51:28Z

Thanks for contributing @sahsaeedi, sorry for the delay answering.
Do you have some reference results to share (wandb run) that use this implementation?

sahsaeedi · 2024-09-18T01:26:31Z

Hi @qgallouedec,

Yes, I fine-tuned llama3-8B-Instrcut on llama3-ultrafeedback-armorm dataset.

Let me know if you need some specific results.
Thanks!

kashif · 2024-09-18T16:14:56Z

so @sahsaeedi I kinda refactored the DPO's data processing helpers etc. and was thinking... can one just subclass the DPOTrainer for this method?

sahsaeedi · 2024-09-18T17:23:49Z

Hi @kashif,

I am still trying to figure it out, and my main concern is processing the data. TPO data processing is a little different from DPO. Also, the dataset needs to be processed, and some conditions need to be met. It should be good to have inherent TPO from the Trainer instead of DPOTrainer. However, if you think we have to do that, I will start working on it.

sahsaeedi · 2024-09-25T16:35:50Z

Hi @qgallouedec,
Is there any update?

qgallouedec · 2024-10-04T14:04:09Z

Hi @sahsaeedi, sorry for the delay. Could you please update your branch? We've been doing a lot of work recently to standardize the API through trainers, docs, configurations, etc. This branch should be aligned with recent changes. Feel free to ask if you need help with this.

In addition, we're working on refactoring the data processing in DPO (which I think your code is mainly inspired by) because it's too complex at the moment. I'd like to avoid refactoring two trainers, so I won't merge this one until it's done. You'll probably have to do a second round of update.

sahsaeedi · 2024-10-04T17:26:19Z

Hi @qgallouedec ,

No worries. Thanks to response.
Would you like me to update the branch here, or should I update the TPOTrainer with the latest trl version?

sahsaeedi and others added 3 commits August 24, 2024 13:09

added initial TPO implementation

ae1ab50

Merge branch 'main' into tpo

26a6daa

Merge branch 'main' into tpo

637d55a

sahsaeedi and others added 7 commits August 26, 2024 10:06

Merge branch 'main' into tpo

39d84ff

fixed the address in the utils.py

54dcea8

Moved custom function from utils to tpo_trainer

c37364d

Merging origin

caa54c2

Merge branch 'main' into tpo

47b125f

Merge branch 'main' into tpo

4d532b7

Merge branch 'main' into tpo

e166542

sahsaeedi added 8 commits September 4, 2024 10:50

Merge branch 'main' into tpo

6066843

Merge branch 'main' into tpo

b37687c

Merge branch 'main' into tpo

d8e5e67

Merge branch 'main' into tpo

d50c0c8

Merge branch 'main' into tpo

7185e7b

Merge branch 'main' into tpo

9eeba98

Merge branch 'main' into tpo

ad3ae91

Merge branch 'main' into tpo

3d61f43

sahsaeedi added 2 commits September 17, 2024 21:08

Merge branch 'main' into tpo

ddbb8ff

Merge branch 'main' into tpo

a1eba9a

sahsaeedi added 2 commits September 18, 2024 14:52

Merge branch 'main' into tpo

21b6136

Merge branch 'main' into tpo

cf45eb8

Merge branch 'main' into tpo

a5de315

sahsaeedi added 2 commits October 1, 2024 06:18

Merge branch 'main' into tpo

9d752e2

Merge branch 'main' into tpo

0cd3619

Merge branch 'main' into tpo

bc88bc2

Merge branch 'main' into tpo

f74a156

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added initial TPO implementation #1965

added initial TPO implementation #1965

sahsaeedi commented Aug 24, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 26, 2024

sahsaeedi commented Sep 3, 2024

qgallouedec commented Sep 17, 2024

sahsaeedi commented Sep 18, 2024 •

edited

Loading

kashif commented Sep 18, 2024

sahsaeedi commented Sep 18, 2024

sahsaeedi commented Sep 25, 2024

qgallouedec commented Oct 4, 2024

sahsaeedi commented Oct 4, 2024

added initial TPO implementation #1965

Are you sure you want to change the base?

added initial TPO implementation #1965

Conversation

sahsaeedi commented Aug 24, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Aug 26, 2024

sahsaeedi commented Sep 3, 2024

qgallouedec commented Sep 17, 2024

sahsaeedi commented Sep 18, 2024 • edited Loading

kashif commented Sep 18, 2024

sahsaeedi commented Sep 18, 2024

sahsaeedi commented Sep 25, 2024

qgallouedec commented Oct 4, 2024

sahsaeedi commented Oct 4, 2024

sahsaeedi commented Aug 24, 2024 •

edited

Loading

sahsaeedi commented Sep 18, 2024 •

edited

Loading