-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Propose to update & upgrade SkyReels-V2 #12167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Wraps the visual demonstration section in a Markdown code block. This change corrects the rendering of ASCII diagrams and examples, improving the overall readability of the document.
Improves the readability of the `step_matrix` examples by replacing long sequences of repeated numbers with a more compact `value×count` notation. This change makes the underlying data patterns in the examples easier to understand at a glance.
…ne and sine frequencies
…rt and remove outdated notes
44d84e4
to
5c6ce3c
Compare
…ine.to("cuda") for GPU allocation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving the docs!
|
||
Key Pattern: Block i lags behind Block i-1 by exactly ar_step=5 timesteps, creating the | ||
staggered "diffusion forcing" effect where later blocks condition on cleaner earlier blocks. | ||
```text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving! I think the text
block should only be used for the graph and chart visuals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified accordingly. I used backticks for rows representation too, otherwise the columns don't seem as aligned. How is it now?
|
||
## Notes | ||
|
||
- SkyReels-V2 supports LoRAs with [`~loaders.SkyReelsV2LoraLoaderMixin.load_lora_weights`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the LoRA example being removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part was copied from Wan's page. Since I didn't test anything with LoRAs, I removed the specific example. But SkyReels-V2 and Wan have almost the same architecture. I preserved - SkyReels-V2 supports LoRAs with [~loaders.SkyReelsV2LoraLoaderMixin.load_lora_weights
].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the docs LGTM!
Let's wait for @a-r-r-o-w to chime in on the other changes before we merge :)
Alright, thanks for your review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice @tolgacangoz! Thanks for propagating the attention backend changes to SkyReels
# Some acceleration helpers | ||
# Be sure to install Flash Attention: https://github.com/Dao-AILab/flash-attention#installation-and-features | ||
# Normally 14 min., with compile_repeated_blocks(fullgraph=True) 12 min., with Flash Attention too less min. at A100. | ||
# If you want to follow the original implementation in terms of attentions: | ||
#for block in pipeline.transformer.blocks: | ||
# block.attn1.set_attention_backend("_native_cudnn") | ||
# block.attn2.set_attention_backend("flash_varlen") # or "_flash_varlen_3" | ||
#pipeline.transformer.compile_repeated_blocks(fullgraph=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice! We can use this instead of iterating the blocks:
def set_attention_backend(self, backend: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set different attention backends for attn1
and attn2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we need to showcase different backends though? Is there a specific reason?
Examples should showcase usage via the official expected APIs. There are usually multiple side-effects when not doing so. In this case, one instance could be the following warning not being raised:
logger.warning("Attention backends are an experimental feature and the API may be subject to change.")
Another problem is the following check will be skipped. When skipped, not-so-technical users will be baffled by the traceback:
_check_attention_backend_requirements(backend)
Custom uses are okay, but usually only best done by power-users who have a better understanding of the library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SkyReels-V2's self-attentions use "_native_cudnn"
with a custom attention mask, and its cross-attentions use varlen types. We cannot set flash attention for all because it doesn't support a custom mask, right?
I agree to do this at the expected API. I opened a feature request for this, but then I thought it was not much necessary: #12210.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove this # Some acceleration helpers
comment section completely for now. Since this model is a bit slow, I just wanted to add several speedup helpers. I plan to profile and optimize this model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed them.
Removed comments about acceleration helpers and Flash Attention installation.
"_native_cudnn"
for self-attentions and"flash_varlen"
or"_flash_varlen_3"
for cross-attentions.pipeline.transformer.compile_repeated_blocks(fullgraph=True)
and looking forward to be merged torch.compile compatibility with varlen APIs #11970.main
: ~14 min.main.mp4
Wan.s_RoPE.mp4
compile_repeated_blocks(fullgraph=True)
: ~12 min.compile_repeated_blocks(fullgraph=True)
+"_native_cudnn"
forattn1
and"flash"
forattn2
, FA=2.8.3: ~8 min.Wan.s_RoPE+regional.mp4
Wan.s_RoPE+regional+FA.mp4
Reproducer
Environment
@a-r-r-o-w @yiyixuxu @stevhliu