-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel SageAttention Inference #50
base: main
Are you sure you want to change the base?
Conversation
Thanks a lot! We well check the implementation and merge the PR. |
may need to install latest xDiT from source if your env already have FA>=2.7.0, i just make a hotfix to ensure ring flash attn forward compatible with lastest FA and thus will not run into an func launch error. also, the plug-and-play sage attention can only work with CFG parallelism for 2 GPUs now. |
I am busy these days and I will dive into it as soon as I can. |
This PR add a small workaround that can make sage attention work compatible with distributed env, for example, xDiT which will launch by torchrun. Without this workaround, sage attention will run into illegal memory access error after first inference step in distributed env for multi gpus inference. This small workaround also make sage attention work compatible with torch.compile through non-fullgraph compile mode.
@jason-huang03