Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added some initial logic to load the teacher logits #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shamanez
Copy link
Member

@shamanez shamanez commented Aug 1, 2024

  • Initial logic to load teacher logits
  • At the moment, this is only for the logit-based training.

@shamanez shamanez requested a review from Crystalcareai August 5, 2024 07:00
@mertege
Copy link

mertege commented Oct 20, 2024

Hi @shamanez, I created offline logits, and I saved them as "logits". I then loaded this dataset and checked that I could see the "logits" key in the relevant dataset before SFTTrainer. However, in the "compute_loss" function, "inputs" only contains the "input_ids" and "attention_mask" keys. Since there are none of the "logits" in "compute_loss", I cannot get the teacher logits. I think that trl SFTTrainer only gets inputs with "input_ids" and "attantion_mask" keys. Have you encountered this kind of problem?

I appreciate any help you can provide.

@shamanez
Copy link
Member Author

can you try remove_unused_columns=False in the SFT trainer.

@mertege
Copy link

mertege commented Oct 21, 2024

can you try remove_unused_columns=False in the SFT trainer.

Thanks @shamanez it works.

@agokrani
Copy link

agokrani commented Jan 7, 2025

@shamanez @mertege I created this toolkit. It supports offline distillation by default and also supports LoRA/QLoRA FT.
https://github.com/agokrani/distillKitPlus

Would appreciate if you guys take a look and provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants