📹 New video generation pipelines
Wan VACE
Wan VACE supports various generation techniques which achieve controllable video generation. It comes in two variants: a 1.3B model for fast iteration & prototyping, and a 14B for high quality generation. Some of the capabilities include:
- Control to Video (Depth, Pose, Sketch, Flow, Grayscale, Scribble, Layout, Boundary Box, etc.). Recommended library for preprocessing videos to obtain control videos: huggingface/controlnet_aux
- Image/Video to Video (first frame, last frame, starting clip, ending clip, random clips)
- Inpainting and Outpainting
- Subject to Video (faces, object, characters, etc.)
- Composition to Video (reference anything, animate anything, swap anything, expand anything, move anything, etc.)
The code snippets available in this pull request demonstrate some examples of how videos can be generated with controllability signals.
Check out the docs to learn more.
Cosmos Predict2 Video2World
Cosmos-Predict2 is a key branch of the Cosmos World Foundation Models (WFMs) ecosystem for Physical AI, specializing in future state prediction through advanced world modeling. It offers two powerful capabilities: text-to-image generation for creating high-quality images from text descriptions, and video-to-world generation for producing visual simulations from video inputs.
The Video2World model comes in a 2B and 14B variant. Check out the docs to learn more.
LTX 0.9.7 and Distilled
LTX 0.9.7 and its distilled variants are the latest in the family of models released by Lightricks.
Check out the docs to learn more.
Hunyuan Video Framepack and F1
Framepack is a novel method for enabling long video generation. There are two released variants of Hunyuan Video trained using this technique. Check out the docs to learn more.
FusionX
The FusionX family of models and LoRAs, built on top of Wan2.1-14B, should already be supported. To load the model, use from_single_file()
:
transformer = AutoModel.from_single_file(
"https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/blob/main/Wan14Bi2vFusioniX_fp16.safetensors",
torch_dtype=torch.bfloat16
)
To load the LoRAs, use load_lora_weights()
:
pipe = DiffusionPipeline.from_pretrained(
"Wan-AI/Wan2.1-T2V-14B-Diffusers",
torch_dtype=torch.bfloat16
).to("cuda")
pipe.load_lora_weights(
"vrgamedevgirl84/Wan14BT2VFusioniX", weight_name="FusionX_LoRa/Wan2.1_T2V_14B_FusionX_LoRA.safetensors"
)
AccVideo and CausVid (only LoRAs)
AccVideo and CausVid are two novel distillation techniques that speed up the generation time of video diffusion models while preserving quality. Diffusers supports loading their extracted LoRAs with their respective models.
🌠 New image generation pipelines
Cosmos Predict2 Text2Image
Text-to-image models from the Cosmos-Predict2 release. The models comes in a 2B and 14B variant. Check out the docs to learn more.
Chroma
Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it. Checkout the docs to learn more
Thanks to @Ednaordinary for contributing it in this PR!
VisualCloze
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning is an innovative in-context learning framework based universal image generation framework that offers key capabilities:
- Support for various in-domain tasks
- Generalization to unseen tasks through in-context learning
- Unify multiple tasks into one step and generate both target image and intermediate results
- Support reverse-engineering conditions from target images
Check out the docs to learn more. Thanks to @lzyhha for contributing this in this PR!
Better torch.compile
support
We have worked with the PyTorch team to improve how we provide torch.compile()
compatibility throughout the library. More specifically, we now test the widely used models like Flux for any recompilation and graph break issues which can get in the way of fully realizing torch.compile()
benefits. Refer to the following links to learn more:
Additionally, users can combine offloading with compilation to get a better speed-memory trade-off. Below is an example:
Code
import torch
from diffusers import DiffusionPipeline
torch._dynamo.config.cache_size_limit = 10000
pipeline = DiffusionPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16
)
pipline.enable_model_cpu_offload()
# Compile.
pipeline.transformer.compile()
image = pipeline(
prompt="An astronaut riding a horse on Mars",
guidance_scale=0.,
height=768,
width=1360,
num_inference_steps=4,
max_sequence_length=256,
).images[0]
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
This is compatible with group offloading, too. Interested readers can check out the concerned PRs below:
You can substantially reduce memory requirements by combining quantization with offloading and then improving speed with torch.compile()
. Below is an example:
Code
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
from diffusers import AutoModel, FluxPipeline
from transformers import T5EncoderModel
import torch
torch._dynamo.config.recompile_limit = 1000
quant_kwargs = {"load_in_4bit": True, "bnb_4bit_compute_dtype": torch_dtype, "bnb_4bit_quant_type": "nf4"}
text_encoder_2_quant_config = TransformersBitsAndBytesConfig(**quant_kwargs)
dit_quant_config = DiffusersBitsAndBytesConfig(**quant_kwargs)
ckpt_id = "black-forest-labs/FLUX.1-dev"
text_encoder_2 = T5EncoderModel.from_pretrained(
ckpt_id,
subfolder="text_encoder_2",
quantization_config=text_encoder_2_quant_config,
torch_dtype=torch_dtype,
)
transformer = AutoModel.from_pretrained(
ckpt_id,
subfolder="transformer",
quantization_config=dit_quant_config,
torch_dtype=torch_dtype,
)
pipe = FluxPipeline.from_pretrained(
ckpt_id,
transformer=transformer,
text_encoder_2=text_encoder_2,
torch_dtype=torch_dtype,
)
pipe.enable_model_cpu_offload()
pipe.transformer.compile()
image = pipeline(
prompt="An astronaut riding a horse on Mars",
guidance_scale=3.5,
height=768,
width=1360,
num_inference_steps=28,
max_sequence_length=512,
).images[0]
Starting from bitsandbytes==0.46.0
onwards, bnb-quantized models should be fully compatible with torch.compile()
without graph-breaks. This means that when compiling a bnb-quantized model, users can do: model.compile(fullgraph=True)
. This can significantly improve speed while still providing memory benefits. The figure below provides a comparison with Flux.1-Dev. Refer to this benchmarking script to learn more.
Note that for 4bit bnb models, it’s currently needed to install PyTorch nightly if fullgraph=True
is specified during compilation.
Huge shoutout to @anijain2305 and @StrongerXi from the PyTorch team for the incredible support.
PipelineQuantizationConfig
Users can now provide a quantization config while initializing a pipeline:
import torch
from diffusers import DiffusionPipeline
from diffusers.quantizers import PipelineQuantizationConfig
pipeline_quant_config = PipelineQuantizationConfig(
quant_backend="bitsandbytes_4bit",
quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16},
components_to_quantize=["transformer", "text_encoder_2"],
)
pipe = DiffusionPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
quantization_config=pipeline_quant_config,
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe("photo of a cute dog").images[0]
This reduces the barrier to entry for our users willing to use quantization without having to write too much code. Refer to the documentation to learn more about different configurations allowed through PipelineQuantizationConfig
.
Group offloading with disk
In the previous release, we shipped “group offloading” which lets you offload blocks/nodes within a model, optimizing its memory consumption. It also lets you overlap this offloading with computation, providing a good speed-memory trade-off, especially in low VRAM environments.
However, you still need a considerable amount of system RAM to make offloading work effectively. So, low VRAM and low RAM environments would still not work.
Starting this release, users will additionally have the option to offload to disk instead of RAM, further lowering memory consumption. Set the offload_to_disk_path
to enable this feature.
pipeline.transformer.enable_group_offload(
onload_device="cuda",
offload_device="cpu",
offload_type="leaf_level",
offload_to_disk_path="path/to/disk"
)
Refer to these two tables to compare the speed and memory trade-offs.
LoRA metadata parsing
It is beneficial to include the LoraConfig
in a LoRA state dict that was used to train the LoRA. In its absence, users were restricted to using the same LoRA alpha as the LoRA rank. We have modified the most popular training scripts to allow passing custom lora_alpha
through the CLI. Refer to this thread for more updates. Refer to this comment for some extended clarifications.
New training scripts
- We now have a capable training script for training robust timestep-distilled models through the SANA Sprint framework. Check out this resource for more details. Thanks to @scxue and @lawrence-cj for contributing it in this PR.
- HiDream LoRA DreamBooth training script (docs). The script supports training with quantization. HiDream is an MIT-licensed model. So, make it yours with this training script.
Updates on educational materials on quantization
We have worked on a two-part series discussing the support of quantization in Diffusers. Check them out:
All commits
- [LoRA] support musubi wan loras. by @sayakpaul in #11243
- fix test_vanilla_funetuning failure on XPU and A100 by @yao-matrix in #11263
- make test_stable_diffusion_inpaint_fp16 pass on XPU by @yao-matrix in #11264
- make test_dict_tuple_outputs_equivalent pass on XPU by @yao-matrix in #11265
- add onnxruntime-qnn & onnxruntime-cann by @xieofxie in #11269
- make test_instant_style_multiple_masks pass on XPU by @yao-matrix in #11266
- [BUG] Fix convert_vae_pt_to_diffusers bug by @lavinal712 in #11078
- Fix LTX 0.9.5 single file by @hlky in #11271
- [Tests] Cleanup lora tests utils by @sayakpaul in #11276
- [CI] relax tolerance for unclip further by @sayakpaul in #11268
- do not use
DIFFUSERS_REQUEST_TIMEOUT
for notification bot by @sayakpaul in #11273 - Fix incorrect tile_latent_min_width calculation in AutoencoderKLMochi by @kuantuna in #11294
- HiDream Image by @hlky in #11231
- flow matching lcm scheduler by @quickjkee in #11170
- Update autoencoderkl_allegro.md by @Forbu in #11303
- Hidream refactoring follow ups by @a-r-r-o-w in #11299
- Fix incorrect tile_latent_min_width calculations by @kuantuna in #11305
- [ControlNet] Adds controlnet for SanaTransformer by @ishan-modi in #11040
- make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU by @yao-matrix in #11308
- make test_stable_diffusion_karras_sigmas pass on XPU by @yao-matrix in #11310
- make
KolorsPipelineFastTests::test_inference_batch_single_identical
pass on XPU by @faaany in #11313 - [LoRA] support more SDXL loras. by @sayakpaul in #11292
- [HiDream] code example by @linoytsaban in #11317
- import for FlowMatchLCMScheduler by @asomoza in #11318
- Use float32 on mps or npu in transformer_hidream_image's rope by @hlky in #11316
- Add
skrample
section tocommunity_projects.md
by @Beinsezii in #11319 - [docs] Promote
AutoModel
usage by @sayakpaul in #11300 - [LoRA] Add LoRA support to AuraFlow by @hameerabbasi in #10216
- Fix vae.Decoder prev_output_channel by @hlky in #11280
- fix CPU offloading related fail cases on XPU by @yao-matrix in #11288
- [docs] fix hidream docstrings. by @sayakpaul in #11325
- Rewrite AuraFlowPatchEmbed.pe_selection_index_based_on_dim to be torch.compile compatible by @AstraliteHeart in #11297
- post release 0.33.0 by @sayakpaul in #11255
- another fix for FlowMatchLCMScheduler forgotten import by @asomoza in #11330
- Fix Hunyuan I2V for
transformers>4.47.1
by @DN6 in #11293 - unpin torch versions for onnx Dockerfile by @sayakpaul in #11290
- [single file] enable telemetry for single file loading when using GGUF. by @sayakpaul in #11284
- [docs] add a snippet for compilation in the auraflow docs. by @sayakpaul in #11327
- Hunyuan I2V fast tests fix by @DN6 in #11341
- [BUG] fixed _toctree.yml alphabetical ordering by @ishan-modi in #11277
- Fix wrong dtype argument name as torch_dtype by @nPeppon in #11346
- [chore] fix lora docs utils by @sayakpaul in #11338
- [docs] add note about use_duck_shape in auraflow docs. by @sayakpaul in #11348
- [LoRA] Propagate
hotswap
better by @sayakpaul in #11333 - [Hi Dream] follow-up by @yiyixuxu in #11296
- [bitsandbytes] improve dtype mismatch handling for bnb + lora. by @sayakpaul in #11270
- Update controlnet_flux.py by @haofanwang in #11350
- enable 2 test cases on XPU by @yao-matrix in #11332
- [BNB] Fix test_moving_to_cpu_throws_warning by @SunMarc in #11356
- support Wan-FLF2V by @yiyixuxu in #11353
- Fix:
StableDiffusionXLControlNetAdapterInpaintPipeline
incorrectly inheritedStableDiffusionLoraLoaderMixin
by @Kazuki-Yoda in #11357 - update output for Hidream transformer by @yiyixuxu in #11366
- [Wan2.1-FLF2V] update conversion script by @yiyixuxu in #11365
- [Flux LoRAs] fix lr scheduler bug in distributed scenarios by @linoytsaban in #11242
- [train_dreambooth_lora_sdxl.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @kghamilton89 in #11240
- fix issue that training flux controlnet was unstable and validation r… by @PromeAIpro in #11373
- Fix Wan I2V prepare_latents dtype by @a-r-r-o-w in #11371
- [BUG] fixes in kadinsky pipeline by @ishan-modi in #11080
- Add Serialized Type Name kwarg in Model Output by @anzr299 in #10502
- [cogview4][feat] Support attention mechanism with variable-length support and batch packing by @OleehyO in #11349
- Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma by @josephrocca in #11120
- [Refactor] Minor Improvement for import utils by @ishan-modi in #11161
- Add stochastic sampling to FlowMatchEulerDiscreteScheduler by @apolinario in #11369
- [LoRA] add LoRA support to HiDream and fine-tuning script by @linoytsaban in #11281
- Update modeling imports by @a-r-r-o-w in #11129
- [HiDream] move deprecation to 0.35.0 by @yiyixuxu in #11384
- Update README_hidream.md by @AMEERAZAM08 in #11386
- Fix group offloading with block_level and use_stream=True by @a-r-r-o-w in #11375
- [train_dreambooth_flux] Add LANCZOS as the default interpolation mode for image resizing by @ishandutta0098 in #11395
- [Feature] Added Xlab Controlnet support by @ishan-modi in #11249
- Kolors additional pipelines, community contrib by @Teriks in #11372
- [HiDream LoRA] optimizations + small updates by @linoytsaban in #11381
- Fix Flux IP adapter argument in the pipeline example by @AeroDEmi in #11402
- [BUG] fixed WAN docstring by @ishan-modi in #11226
- Fix typos in strings and comments by @co63oc in #11407
- [train_dreambooth_lora.py] Set LANCZOS as default interpolation mode for resizing by @merterbak in #11421
- [tests] add tests to check for graph breaks, recompilation, cuda syncs in pipelines during torch.compile() by @sayakpaul in #11085
- enable group_offload cases and quanto cases on XPU by @yao-matrix in #11405
- enable test_layerwise_casting_memory cases on XPU by @yao-matrix in #11406
- [tests] fix import. by @sayakpaul in #11434
- [train_text_to_image] Better image interpolation in training scripts follow up by @tongyu0924 in #11426
- [train_text_to_image_lora] Better image interpolation in training scripts follow up by @tongyu0924 in #11427
- enable 28 GGUF test cases on XPU by @yao-matrix in #11404
- [Hi-Dream LoRA] fix bug in validation by @linoytsaban in #11439
- Fixing missing provider options argument by @urpetkov-amd in #11397
- Set LANCZOS as the default interpolation for image resizing in ControlNet training by @YoulunPeng in #11449
- Raise warning instead of error for block offloading with streams by @a-r-r-o-w in #11425
- enable marigold_intrinsics cases on XPU by @yao-matrix in #11445
torch.compile
fullgraph compatibility for Hunyuan Video by @a-r-r-o-w in #11457- enable consistency test cases on XPU, all passed by @yao-matrix in #11446
- enable unidiffuser test cases on xpu by @yao-matrix in #11444
- Add generic support for Intel Gaudi accelerator (hpu device) by @dsocek in #11328
- Add StableDiffusion3InstructPix2PixPipeline by @xduzhangjiayu in #11378
- make safe diffusion test cases pass on XPU and A100 by @yao-matrix in #11458
- [test_models_transformer_hunyuan_video] help us test torch.compile() for impactful models by @tongyu0924 in #11431
- Add LANCZOS as default interplotation mode. by @Va16hav07 in #11463
- make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu by @yao-matrix in #11461
- [WAN] fix recompilation issues by @sayakpaul in #11475
- Fix typos in docs and comments by @co63oc in #11416
- [tests] xfail recent pipeline tests for specific methods. by @sayakpaul in #11469
- cache packages_distributions by @vladmandic in #11453
- [docs] Memory optims by @stevhliu in #11385
- [docs] Adapters by @stevhliu in #11331
- [train_dreambooth_lora_sdxl_advanced] Add LANCZOS as the default interpolation mode for image resizing by @yuanjua in #11471
- [train_dreambooth_lora_flux_advanced] Add LANCZOS as the default interpolation mode for image resizing by @ysurs in #11472
- enable semantic diffusion and stable diffusion panorama cases on XPU by @yao-matrix in #11459
- [Feature] Implement tiled VAE encoding/decoding for Wan model. by @c8ef in #11414
- [train_text_to_image_sdxl]Add LANCZOS as default interpolation mode for image resizing by @ParagEkbote in #11455
- [train_dreambooth_lora_sdxl] Add --image_interpolation_mode option for image resizing (default to lanczos) by @MinJu-Ha in #11490
- [train_dreambooth_lora_lumina2] Add LANCZOS as the default interpolation mode for image resizing by @cjfghk5697 in #11491
- [training] feat: enable quantization for hidream lora training. by @sayakpaul in #11494
- Set LANCZOS as the default interpolation method for image resizing. by @yijun-lee in #11492
- Update training script for txt to img sdxl with lora supp with new interpolation. by @RogerSinghChugh in #11496
- Fix torchao docs typo for fp8 granular quantization by @a-r-r-o-w in #11473
- Update setup.py to pin min version of
peft
by @sayakpaul in #11502 - update dep table. by @sayakpaul in #11504
- [LoRA] use
removeprefix
to preserve sanity. by @sayakpaul in #11493 - Hunyuan Video Framepack by @a-r-r-o-w in #11428
- enable lora cases on XPU by @yao-matrix in #11506
- [lora_conversion] Enhance key handling for OneTrainer components in LORA conversion utility by @iamwavecut in #11441)
- [docs] minor updates to bitsandbytes docs. by @sayakpaul in #11509
- Cosmos by @a-r-r-o-w in #10660
- clean up the Init for stable_diffusion by @yiyixuxu in #11500
- fix audioldm by @sayakpaul (direct commit on v0.34.0-release)
- Revert "fix audioldm" by @sayakpaul (direct commit on v0.34.0-release)
- [LoRA] make lora alpha and dropout configurable by @linoytsaban in #11467
- Add cross attention type for Sana-Sprint training in diffusers. by @scxue in #11514
- Conditionally import torchvision in Cosmos transformer by @a-r-r-o-w in #11524
- [tests] fix audioldm2 for transformers main. by @sayakpaul in #11522
- feat: pipeline-level quantization config by @sayakpaul in #11130
- [Tests] Enable more general testing for
torch.compile()
with LoRA hotswapping by @sayakpaul in #11322 - [LoRA] support non-diffusers hidream loras by @sayakpaul in #11532
- enable 7 cases on XPU by @yao-matrix in #11503
- [LTXPipeline] Update latents dtype to match VAE dtype by @james-p-xu in #11533
- enable dit integration cases on xpu by @yao-matrix in #11523
- enable print_env on xpu by @yao-matrix in #11507
- Change Framepack transformer layer initialization order by @a-r-r-o-w in #11535
- [tests] add tests for framepack transformer model. by @sayakpaul in #11520
- Hunyuan Video Framepack F1 by @a-r-r-o-w in #11534
- enable several pipeline integration tests on XPU by @yao-matrix in #11526
- [test_models_transformer_ltx.py] help us test torch.compile() for impactful models by @cjfghk5697 in #11512
- Add VisualCloze by @lzyhha in #11377
- Fix typo in train_diffusion_orpo_sdxl_lora_wds.py by @Meeex2 in #11541
- fix: remove
torch_dtype="auto"
option from docstrings by @johannaSommer in #11513 - [train_dreambooth.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @kghamilton89 in #11239
- [LoRA] small change to support Hunyuan LoRA Loading for FramePack by @linoytsaban in #11546
- LTX Video 0.9.7 by @a-r-r-o-w in #11516
- [tests] Enable testing for HiDream transformer by @sayakpaul in #11478
- Update pipeline_flux_img2img.py to add missing vae_slicing and vae_tiling calls. by @Meatfucker in #11545
- Fix deprecation warnings in test_ltx_image2video.py by @AChowdhury1211 in #11538
- [tests] Add torch.compile test for UNet2DConditionModel by @olccihyeon in #11537
- [Single File] GGUF/Single File Support for HiDream by @DN6 in #11550
- [gguf] Refactor torch_function to avoid unnecessary computation by @anijain2305 in #11551
- [tests] add tests for combining layerwise upcasting and groupoffloading. by @sayakpaul in #11558
- [docs] Regional compilation docs by @sayakpaul in #11556
- enhance value guard of _device_agnostic_dispatch by @yao-matrix in #11553
- Doc update by @Player256 in #11531
- Revert error to warning when loading LoRA from repo with multiple weights by @apolinario in #11568
- [docs] tip for group offloding + quantization by @sayakpaul in #11576
- [LoRA] support non-diffusers LTX-Video loras by @linoytsaban in #11572
- [WIP][LoRA] start supporting kijai wan lora. by @sayakpaul in #11579
- [Single File] Fix loading for LTX 0.9.7 transformer by @DN6 in #11578
- Use HF Papers by @qgallouedec in #11567
- LTX 0.9.7-distilled; documentation improvements by @a-r-r-o-w in #11571
- [LoRA] kijai wan lora support for I2V by @linoytsaban in #11588
- docs: fix invalid links by @osrm in #11505
- [docs] Remove fast diffusion tutorial by @stevhliu in #11583
- RegionalPrompting: Inherit from Stable Diffusion by @b-sai in #11525
- [chore] allow string device to be passed to randn_tensor. by @sayakpaul in #11559
- Type annotation fix by @DN6 in #11597
- [LoRA] minor fix for
load_lora_weights()
for Flux and a test by @sayakpaul in #11595 - Update Intel Gaudi doc by @regisss in #11479
- enable pipeline test cases on xpu by @yao-matrix in #11527
- [Feature] AutoModel can load components using model_index.json by @ishan-modi in #11401
- [docs] Pipeline-level quantization by @stevhliu in #11604
- Fix bug when
variant
andsafetensor
file does not match by @kaixuanliu in #11587 - [tests] Changes to the
torch.compile()
CI and tests by @sayakpaul in #11508 - Fix mixed variant downloading by @DN6 in #11611
- fix security issue in build docker ci by @sayakpaul in #11614
- Make group offloading compatible with torch.compile() by @sayakpaul in #11605
- [training docs] smol update to README files by @linoytsaban in #11616
- Adding NPU for get device function by @leisuzz in #11617
- [LoRA] improve LoRA fusion tests by @sayakpaul in #11274
- [Sana Sprint] add image-to-image pipeline by @linoytsaban in #11602
- [CI] fix the filename for displaying failures in lora ci. by @sayakpaul in #11600
- [docs] PyTorch 2.0 by @stevhliu in #11618
- [textual_inversion_sdxl.py] fix lr scheduler steps count by @yuanjua in #11557
- Fix wrong indent for examples of controlnet script by @Justin900429 in #11632
- removing unnecessary else statement by @YanivDorGalron in #11624
- enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed by @yao-matrix in #11620
- Bug: Fixed Image 2 Image example by @vltmedia in #11619
- typo fix in pipeline_flux.py by @YanivDorGalron in #11623
- Fix typos in strings and comments by @co63oc in #11476
- [docs] update torchao doc link by @sayakpaul in #11634
- Use float32 RoPE freqs in Wan with MPS backends by @hvaara in #11643
- [chore] misc changes in the bnb tests for consistency. by @sayakpaul in #11355
- [tests] chore: rename lora model-level tests. by @sayakpaul in #11481
- [docs] Caching methods by @stevhliu in #11625
- [docs] Model cards by @stevhliu in #11112
- [CI] Some improvements to Nightly reports summaries by @DN6 in #11166
- [chore] bring PipelineQuantizationConfig at the top of the import chain. by @sayakpaul in #11656
- [examples] flux-control: use num_training_steps_for_scheduler by @Markus-Pobitzer in #11662
- use deterministic to get stable result by @jiqing-feng in #11663
- [tests] add test for torch.compile + group offloading by @sayakpaul in #11670
- Wan VACE by @a-r-r-o-w in #11582
- fixed axes_dims_rope init (huggingface#11641) by @sofinvalery in #11678
- [tests] Fix how compiler mixin classes are used by @sayakpaul in #11680
- Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process by @DN6 in #11596
- Add community class StableDiffusionXL_T5Pipeline by @ppbrown in #11626
- Update pipeline_flux_inpaint.py to fix padding_mask_crop returning only the inpainted area by @Meatfucker in #11658
- Allow remote code repo names to contain "." by @akasharidas in #11652
- [LoRA] support Flux Control LoRA with bnb 8bit. by @sayakpaul in #11655
- [
Wan
] Fix VAE sampling mode inWanVideoToVideoPipeline
by @tolgacangoz in #11639 - enable torchao test cases on XPU and switch to device agnostic APIs for test cases by @yao-matrix in #11654
- [tests] tests for compilation + quantization (bnb) by @sayakpaul in #11672
- [tests] model-level
device_map
clarifications by @sayakpaul in #11681 - Improve Wan docstrings by @a-r-r-o-w in #11689
- Set _torch_version to N/A if torch is disabled. by @rasmi in #11645
- Avoid DtoH sync from access of nonzero() item in scheduler by @jbschlosser in #11696
- Apply Occam's Razor in position embedding calculation by @tolgacangoz in #11562
- [docs] add compilation bits to the bitsandbytes docs. by @sayakpaul in #11693
- swap out token for style bot. by @sayakpaul in #11701
- [docs] mention fp8 benefits on supported hardware. by @sayakpaul in #11699
- Support Wan AccVideo lora by @a-r-r-o-w in #11704
- [LoRA] parse metadata from LoRA and save metadata by @sayakpaul in #11324
- Cosmos Predict2 by @a-r-r-o-w in #11695
- Chroma Pipeline by @Ednaordinary in #11698
- [LoRA ]fix flux lora loader when return_metadata is true for non-diffusers by @sayakpaul in #11716
- [training] show how metadata stuff should be incorporated in training scripts. by @sayakpaul in #11707
- Fix misleading comment by @carlthome in #11722
- Add Pruna optimization framework documentation by @davidberenstein1957 in #11688
- Support more Wan loras (VACE) by @a-r-r-o-w in #11726
- [LoRA training] update metadata use for lora alpha + README by @linoytsaban in #11723
- ⚡️ Speed up method
AutoencoderKLWan.clear_cache
by 886% by @misrasaurabh1 in #11665 - [training] add ds support to lora hidream by @leisuzz in #11737
- [tests] device_map tests for all models. by @sayakpaul in #11708
- [chore] change to 2025 licensing for remaining by @sayakpaul in #11741
- Chroma Follow Up by @DN6 in #11725
- [Quantizers] add
is_compileable
property to quantizers. by @sayakpaul in #11736 - Update more licenses to 2025 by @a-r-r-o-w in #11746
- Add missing HiDream license by @a-r-r-o-w in #11747
- Bump urllib3 from 2.2.3 to 2.5.0 in /examples/server by @dependabot[bot] in #11748
- [LoRA] refactor lora loading at the model-level by @sayakpaul in #11719
- [CI] Fix WAN VACE tests by @DN6 in #11757
- [CI] Fix SANA tests by @DN6 in #11756
- Fix HiDream pipeline test module by @DN6 in #11754
- make group offloading work with disk/nvme transfers by @sayakpaul in #11682
- Update Chroma Docs by @DN6 in #11753
- fix invalid component handling behaviour in
PipelineQuantizationConfig
by @sayakpaul in #11750 - Fix failing cpu offload test for LTX Latent Upscale by @DN6 in #11755
- [docs] Quantization + torch.compile + offloading by @stevhliu in #11703
- [docs] device_map by @stevhliu in #11711
- [docs] LoRA scale scheduling by @stevhliu in #11727
- Fix dimensionalities in
apply_rotary_emb
functions' comments by @tolgacangoz in #11717 - enable deterministic in bnb 4 bit tests by @jiqing-feng in #11738
- enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU by @yao-matrix in #11671
- [tests] properly skip tests instead of
return
by @sayakpaul in #11771 - [CI] Skip ONNX Upscale tests by @DN6 in #11774
- [Wan] Fix mask padding in Wan VACE pipeline. by @bennyguo in #11778
- Add --lora_alpha and metadata handling to train_dreambooth_lora_sana.py by @imbr92 in #11744
- [docs] minor cleanups in the lora docs. by @sayakpaul in #11770
- [lora] only remove hooks that we add back by @yiyixuxu in #11768
- [tests] Fix HunyuanVideo Framepack device tests by @a-r-r-o-w in #11789
- [chore] raise as early as possible in group offloading by @sayakpaul in #11792
- [tests] Fix group offloading and layerwise casting test interaction by @a-r-r-o-w in #11796
- guard omnigen processor. by @sayakpaul in #11799
- Release: v0.34.0 by @sayakpaul (direct commit on v0.34.0-release)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @yao-matrix
- fix test_vanilla_funetuning failure on XPU and A100 (#11263)
- make test_stable_diffusion_inpaint_fp16 pass on XPU (#11264)
- make test_dict_tuple_outputs_equivalent pass on XPU (#11265)
- make test_instant_style_multiple_masks pass on XPU (#11266)
- make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU (#11308)
- make test_stable_diffusion_karras_sigmas pass on XPU (#11310)
- fix CPU offloading related fail cases on XPU (#11288)
- enable 2 test cases on XPU (#11332)
- enable group_offload cases and quanto cases on XPU (#11405)
- enable test_layerwise_casting_memory cases on XPU (#11406)
- enable 28 GGUF test cases on XPU (#11404)
- enable marigold_intrinsics cases on XPU (#11445)
- enable consistency test cases on XPU, all passed (#11446)
- enable unidiffuser test cases on xpu (#11444)
- make safe diffusion test cases pass on XPU and A100 (#11458)
- make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu (#11461)
- enable semantic diffusion and stable diffusion panorama cases on XPU (#11459)
- enable lora cases on XPU (#11506)
- enable 7 cases on XPU (#11503)
- enable dit integration cases on xpu (#11523)
- enable print_env on xpu (#11507)
- enable several pipeline integration tests on XPU (#11526)
- enhance value guard of _device_agnostic_dispatch (#11553)
- enable pipeline test cases on xpu (#11527)
- enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed (#11620)
- enable torchao test cases on XPU and switch to device agnostic APIs for test cases (#11654)
- enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU (#11671)
- @hlky
- @quickjkee
- flow matching lcm scheduler (#11170)
- @ishan-modi
- [ControlNet] Adds controlnet for SanaTransformer (#11040)
- [BUG] fixed _toctree.yml alphabetical ordering (#11277)
- [BUG] fixes in kadinsky pipeline (#11080)
- [Refactor] Minor Improvement for import utils (#11161)
- [Feature] Added Xlab Controlnet support (#11249)
- [BUG] fixed WAN docstring (#11226)
- [Feature] AutoModel can load components using model_index.json (#11401)
- @linoytsaban
- [HiDream] code example (#11317)
- [Flux LoRAs] fix lr scheduler bug in distributed scenarios (#11242)
- [LoRA] add LoRA support to HiDream and fine-tuning script (#11281)
- [HiDream LoRA] optimizations + small updates (#11381)
- [Hi-Dream LoRA] fix bug in validation (#11439)
- [LoRA] make lora alpha and dropout configurable (#11467)
- [LoRA] small change to support Hunyuan LoRA Loading for FramePack (#11546)
- [LoRA] support non-diffusers LTX-Video loras (#11572)
- [LoRA] kijai wan lora support for I2V (#11588)
- [training docs] smol update to README files (#11616)
- [Sana Sprint] add image-to-image pipeline (#11602)
- [LoRA training] update metadata use for lora alpha + README (#11723)
- @hameerabbasi
- [LoRA] Add LoRA support to AuraFlow (#10216)
- @DN6
- Fix Hunyuan I2V for
transformers>4.47.1
(#11293) - Hunyuan I2V fast tests fix (#11341)
- [Single File] GGUF/Single File Support for HiDream (#11550)
- [Single File] Fix loading for LTX 0.9.7 transformer (#11578)
- Type annotation fix (#11597)
- Fix mixed variant downloading (#11611)
- [CI] Some improvements to Nightly reports summaries (#11166)
- Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process (#11596)
- Chroma Follow Up (#11725)
- [CI] Fix WAN VACE tests (#11757)
- [CI] Fix SANA tests (#11756)
- Fix HiDream pipeline test module (#11754)
- Update Chroma Docs (#11753)
- Fix failing cpu offload test for LTX Latent Upscale (#11755)
- [CI] Skip ONNX Upscale tests (#11774)
- Fix Hunyuan I2V for
- @yiyixuxu
- @Teriks
- Kolors additional pipelines, community contrib (#11372)
- @co63oc
- @xduzhangjiayu
- Add StableDiffusion3InstructPix2PixPipeline (#11378)
- @scxue
- Add cross attention type for Sana-Sprint training in diffusers. (#11514)
- @lzyhha
- Add VisualCloze (#11377)
- @b-sai
- RegionalPrompting: Inherit from Stable Diffusion (#11525)
- @Ednaordinary
- Chroma Pipeline (#11698)