Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Sequence Parallelism (reopened #6506) #7338

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

HaoshengZou
Copy link

What does this PR do?

add Sequence Parallelism (reopened #6506)

Before submitting

@hiyouga hiyouga added the pending This problem is yet to be addressed label Mar 17, 2025
@hiyouga
Copy link
Owner

hiyouga commented Mar 17, 2025

Nice work, we expect to add this feature to LlamaFactory v1.0

@xiaosu-zhu
Copy link
Contributor

#7335 Has a new implementation with Deepspeed Ulysses, please consider it as an alternative solution.

@HaoshengZou
Copy link
Author

@xiaosu-zhu Thanks for pointing it out!

Deepspeed Ulysses is always on our ToDo list and we have already integrated it recently and now pushed here, supporting --sequence_parallel_mode zigzag-ring / ulysses.
Correctness is thoroughly verified and training speed tested similarly here.
Our Ulysses integration is based on yunchang for now but could be stripped with core functions adapted and source-acknowledged.

This PR reopened #6506, which was opened on Jan.2 and accidentally closed last week #6506 (comment).

#7335 may represent alternative nice ways to split SP data and monkey patching. Still, to get SP 100% correct requires much more detailed work in dealing with padding, labels and loss computation etc. (as we've discussed Swift here), followed by thorough testing.

We are confident that our implementation is 100% correct up to numerical differences inherent to SP, and our implementation bears close-to-minimal modular code change to original LLaMA-Factory.

@zhijie-reallm
Copy link

does it support DPO?

@HaoshengZou
Copy link
Author

@zhijie-reallm yeah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants