Release Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more · huggingface/diffusers

📹 New video generation pipelines

Wan VACE

Wan VACE supports various generation techniques which achieve controllable video generation. It comes in two variants: a 1.3B model for fast iteration & prototyping, and a 14B for high quality generation. Some of the capabilities include:

Control to Video (Depth, Pose, Sketch, Flow, Grayscale, Scribble, Layout, Boundary Box, etc.). Recommended library for preprocessing videos to obtain control videos: huggingface/controlnet_aux
Image/Video to Video (first frame, last frame, starting clip, ending clip, random clips)
Inpainting and Outpainting
Subject to Video (faces, object, characters, etc.)
Composition to Video (reference anything, animate anything, swap anything, expand anything, move anything, etc.)

The code snippets available in this pull request demonstrate some examples of how videos can be generated with controllability signals.

Check out the docs to learn more.

Cosmos Predict2 Video2World

Cosmos-Predict2 is a key branch of the Cosmos World Foundation Models (WFMs) ecosystem for Physical AI, specializing in future state prediction through advanced world modeling. It offers two powerful capabilities: text-to-image generation for creating high-quality images from text descriptions, and video-to-world generation for producing visual simulations from video inputs.

The Video2World model comes in a 2B and 14B variant. Check out the docs to learn more.

LTX 0.9.7 and Distilled

LTX 0.9.7 and its distilled variants are the latest in the family of models released by Lightricks.

Check out the docs to learn more.

Hunyuan Video Framepack and F1

Framepack is a novel method for enabling long video generation. There are two released variants of Hunyuan Video trained using this technique. Check out the docs to learn more.

FusionX

The FusionX family of models and LoRAs, built on top of Wan2.1-14B, should already be supported. To load the model, use from_single_file():

transformer = AutoModel.from_single_file(
    "https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/blob/main/Wan14Bi2vFusioniX_fp16.safetensors",
    torch_dtype=torch.bfloat16
)

To load the LoRAs, use load_lora_weights():

pipe = DiffusionPipeline.from_pretrained(
    "Wan-AI/Wan2.1-T2V-14B-Diffusers",
    torch_dtype=torch.bfloat16
).to("cuda")
pipe.load_lora_weights(
    "vrgamedevgirl84/Wan14BT2VFusioniX", weight_name="FusionX_LoRa/Wan2.1_T2V_14B_FusionX_LoRA.safetensors"
)

AccVideo and CausVid (only LoRAs)

AccVideo and CausVid are two novel distillation techniques that speed up the generation time of video diffusion models while preserving quality. Diffusers supports loading their extracted LoRAs with their respective models.

🌠 New image generation pipelines

Cosmos Predict2 Text2Image

Text-to-image models from the Cosmos-Predict2 release. The models comes in a 2B and 14B variant. Check out the docs to learn more.

Chroma

Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it. Checkout the docs to learn more

Thanks to @Ednaordinary for contributing it in this PR!

VisualCloze

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning is an innovative in-context learning framework based universal image generation framework that offers key capabilities:

Support for various in-domain tasks
Generalization to unseen tasks through in-context learning
Unify multiple tasks into one step and generate both target image and intermediate results
Support reverse-engineering conditions from target images

Check out the docs to learn more. Thanks to @lzyhha for contributing this in this PR!

Better `torch.compile` support

We have worked with the PyTorch team to improve how we provide torch.compile() compatibility throughout the library. More specifically, we now test the widely used models like Flux for any recompilation and graph break issues which can get in the way of fully realizing torch.compile() benefits. Refer to the following links to learn more:

Additionally, users can combine offloading with compilation to get a better speed-memory trade-off. Below is an example:

Code

import torch
from diffusers import DiffusionPipeline
torch._dynamo.config.cache_size_limit = 10000

pipeline = DiffusionPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16
)
pipline.enable_model_cpu_offload()
# Compile.
pipeline.transformer.compile()

image = pipeline(
    prompt="An astronaut riding a horse on Mars",
    guidance_scale=0.,
    height=768,
    width=1360,
    num_inference_steps=4,
    max_sequence_length=256,
).images[0]
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")

This is compatible with group offloading, too. Interested readers can check out the concerned PRs below:

You can substantially reduce memory requirements by combining quantization with offloading and then improving speed with torch.compile(). Below is an example:

Code

from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
from diffusers import AutoModel, FluxPipeline
from transformers import T5EncoderModel

import torch
torch._dynamo.config.recompile_limit = 1000 

quant_kwargs = {"load_in_4bit": True, "bnb_4bit_compute_dtype": torch_dtype, "bnb_4bit_quant_type": "nf4"}
text_encoder_2_quant_config = TransformersBitsAndBytesConfig(**quant_kwargs)
dit_quant_config = DiffusersBitsAndBytesConfig(**quant_kwargs)

ckpt_id = "black-forest-labs/FLUX.1-dev"
text_encoder_2 = T5EncoderModel.from_pretrained(
    ckpt_id,
    subfolder="text_encoder_2",
    quantization_config=text_encoder_2_quant_config,
    torch_dtype=torch_dtype,
)
transformer = AutoModel.from_pretrained(
    ckpt_id,
    subfolder="transformer",
    quantization_config=dit_quant_config,
    torch_dtype=torch_dtype,
)
pipe = FluxPipeline.from_pretrained(
    ckpt_id,
    transformer=transformer,
    text_encoder_2=text_encoder_2,
    torch_dtype=torch_dtype,
)
pipe.enable_model_cpu_offload()
pipe.transformer.compile()

image = pipeline(
    prompt="An astronaut riding a horse on Mars",
    guidance_scale=3.5,
    height=768,
    width=1360,
    num_inference_steps=28,
    max_sequence_length=512,
).images[0]

Starting from bitsandbytes==0.46.0 onwards, bnb-quantized models should be fully compatible with torch.compile() without graph-breaks. This means that when compiling a bnb-quantized model, users can do: model.compile(fullgraph=True). This can significantly improve speed while still providing memory benefits. The figure below provides a comparison with Flux.1-Dev. Refer to this benchmarking script to learn more.

Note that for 4bit bnb models, it’s currently needed to install PyTorch nightly if fullgraph=True is specified during compilation.

Huge shoutout to @anijain2305 and @StrongerXi from the PyTorch team for the incredible support.

PipelineQuantizationConfig

Users can now provide a quantization config while initializing a pipeline:

import torch
from diffusers import DiffusionPipeline
from diffusers.quantizers import PipelineQuantizationConfig

pipeline_quant_config = PipelineQuantizationConfig(
     quant_backend="bitsandbytes_4bit",
     quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16},
     components_to_quantize=["transformer", "text_encoder_2"],
)
pipe = DiffusionPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    quantization_config=pipeline_quant_config,
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe("photo of a cute dog").images[0]

This reduces the barrier to entry for our users willing to use quantization without having to write too much code. Refer to the documentation to learn more about different configurations allowed through PipelineQuantizationConfig.

Group offloading with disk

In the previous release, we shipped “group offloading” which lets you offload blocks/nodes within a model, optimizing its memory consumption. It also lets you overlap this offloading with computation, providing a good speed-memory trade-off, especially in low VRAM environments.

However, you still need a considerable amount of system RAM to make offloading work effectively. So, low VRAM and low RAM environments would still not work.

Starting this release, users will additionally have the option to offload to disk instead of RAM, further lowering memory consumption. Set the offload_to_disk_path to enable this feature.

pipeline.transformer.enable_group_offload(
    onload_device="cuda", 
    offload_device="cpu", 
    offload_type="leaf_level", 
    offload_to_disk_path="path/to/disk"
)

Refer to these two tables to compare the speed and memory trade-offs.

LoRA metadata parsing

It is beneficial to include the LoraConfig in a LoRA state dict that was used to train the LoRA. In its absence, users were restricted to using the same LoRA alpha as the LoRA rank. We have modified the most popular training scripts to allow passing custom lora_alpha through the CLI. Refer to this thread for more updates. Refer to this comment for some extended clarifications.

New training scripts

We now have a capable training script for training robust timestep-distilled models through the SANA Sprint framework. Check out this resource for more details. Thanks to @scxue and @lawrence-cj for contributing it in this PR.
HiDream LoRA DreamBooth training script (docs). The script supports training with quantization. HiDream is an MIT-licensed model. So, make it yours with this training script.

Updates on educational materials on quantization

We have worked on a two-part series discussing the support of quantization in Diffusers. Check them out:

All commits

[LoRA] support musubi wan loras. by @sayakpaul in #11243
fix test_vanilla_funetuning failure on XPU and A100 by @yao-matrix in #11263
make test_stable_diffusion_inpaint_fp16 pass on XPU by @yao-matrix in #11264
make test_dict_tuple_outputs_equivalent pass on XPU by @yao-matrix in #11265
add onnxruntime-qnn & onnxruntime-cann by @xieofxie in #11269
make test_instant_style_multiple_masks pass on XPU by @yao-matrix in #11266
[BUG] Fix convert_vae_pt_to_diffusers bug by @lavinal712 in #11078
Fix LTX 0.9.5 single file by @hlky in #11271
[Tests] Cleanup lora tests utils by @sayakpaul in #11276
[CI] relax tolerance for unclip further by @sayakpaul in #11268
do not use DIFFUSERS_REQUEST_TIMEOUT for notification bot by @sayakpaul in #11273
Fix incorrect tile_latent_min_width calculation in AutoencoderKLMochi by @kuantuna in #11294
HiDream Image by @hlky in #11231
flow matching lcm scheduler by @quickjkee in #11170
Update autoencoderkl_allegro.md by @Forbu in #11303
Hidream refactoring follow ups by @a-r-r-o-w in #11299
Fix incorrect tile_latent_min_width calculations by @kuantuna in #11305
[ControlNet] Adds controlnet for SanaTransformer by @ishan-modi in #11040
make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU by @yao-matrix in #11308
make test_stable_diffusion_karras_sigmas pass on XPU by @yao-matrix in #11310
make KolorsPipelineFastTests::test_inference_batch_single_identical pass on XPU by @faaany in #11313
[LoRA] support more SDXL loras. by @sayakpaul in #11292
[HiDream] code example by @linoytsaban in #11317
import for FlowMatchLCMScheduler by @asomoza in #11318
Use float32 on mps or npu in transformer_hidream_image's rope by @hlky in #11316
Add skrample section to community_projects.md by @Beinsezii in #11319
[docs] Promote AutoModel usage by @sayakpaul in #11300
[LoRA] Add LoRA support to AuraFlow by @hameerabbasi in #10216
Fix vae.Decoder prev_output_channel by @hlky in #11280
fix CPU offloading related fail cases on XPU by @yao-matrix in #11288
[docs] fix hidream docstrings. by @sayakpaul in #11325
Rewrite AuraFlowPatchEmbed.pe_selection_index_based_on_dim to be torch.compile compatible by @AstraliteHeart in #11297
post release 0.33.0 by @sayakpaul in #11255
another fix for FlowMatchLCMScheduler forgotten import by @asomoza in #11330
Fix Hunyuan I2V for transformers>4.47.1 by @DN6 in #11293
unpin torch versions for onnx Dockerfile by @sayakpaul in #11290
[single file] enable telemetry for single file loading when using GGUF. by @sayakpaul in #11284
[docs] add a snippet for compilation in the auraflow docs. by @sayakpaul in #11327
Hunyuan I2V fast tests fix by @DN6 in #11341
[BUG] fixed _toctree.yml alphabetical ordering by @ishan-modi in #11277
Fix wrong dtype argument name as torch_dtype by @nPeppon in #11346
[chore] fix lora docs utils by @sayakpaul in #11338
[docs] add note about use_duck_shape in auraflow docs. by @sayakpaul in #11348
[LoRA] Propagate hotswap better by @sayakpaul in #11333
[Hi Dream] follow-up by @yiyixuxu in #11296
[bitsandbytes] improve dtype mismatch handling for bnb + lora. by @sayakpaul in #11270
Update controlnet_flux.py by @haofanwang in #11350
enable 2 test cases on XPU by @yao-matrix in #11332
[BNB] Fix test_moving_to_cpu_throws_warning by @SunMarc in #11356
support Wan-FLF2V by @yiyixuxu in #11353
Fix: StableDiffusionXLControlNetAdapterInpaintPipeline incorrectly inherited StableDiffusionLoraLoaderMixin by @Kazuki-Yoda in #11357
update output for Hidream transformer by @yiyixuxu in #11366
[Wan2.1-FLF2V] update conversion script by @yiyixuxu in #11365
[Flux LoRAs] fix lr scheduler bug in distributed scenarios by @linoytsaban in #11242
[train_dreambooth_lora_sdxl.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @kghamilton89 in #11240
fix issue that training flux controlnet was unstable and validation r… by @PromeAIpro in #11373
Fix Wan I2V prepare_latents dtype by @a-r-r-o-w in #11371
[BUG] fixes in kadinsky pipeline by @ishan-modi in #11080
Add Serialized Type Name kwarg in Model Output by @anzr299 in #10502
[cogview4][feat] Support attention mechanism with variable-length support and batch packing by @OleehyO in #11349
Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma by @josephrocca in #11120
[Refactor] Minor Improvement for import utils by @ishan-modi in #11161
Add stochastic sampling to FlowMatchEulerDiscreteScheduler by @apolinario in #11369
[LoRA] add LoRA support to HiDream and fine-tuning script by @linoytsaban in #11281
Update modeling imports by @a-r-r-o-w in #11129
[HiDream] move deprecation to 0.35.0 by @yiyixuxu in #11384
Update README_hidream.md by @AMEERAZAM08 in #11386
Fix group offloading with block_level and use_stream=True by @a-r-r-o-w in #11375
[train_dreambooth_flux] Add LANCZOS as the default interpolation mode for image resizing by @ishandutta0098 in #11395
[Feature] Added Xlab Controlnet support by @ishan-modi in #11249
Kolors additional pipelines, community contrib by @Teriks in #11372
[HiDream LoRA] optimizations + small updates by @linoytsaban in #11381
Fix Flux IP adapter argument in the pipeline example by @AeroDEmi in #11402
[BUG] fixed WAN docstring by @ishan-modi in #11226
Fix typos in strings and comments by @co63oc in #11407
[train_dreambooth_lora.py] Set LANCZOS as default interpolation mode for resizing by @merterbak in #11421
[tests] add tests to check for graph breaks, recompilation, cuda syncs in pipelines during torch.compile() by @sayakpaul in #11085
enable group_offload cases and quanto cases on XPU by @yao-matrix in #11405
enable test_layerwise_casting_memory cases on XPU by @yao-matrix in #11406
[tests] fix import. by @sayakpaul in #11434
[train_text_to_image] Better image interpolation in training scripts follow up by @tongyu0924 in #11426
[train_text_to_image_lora] Better image interpolation in training scripts follow up by @tongyu0924 in #11427
enable 28 GGUF test cases on XPU by @yao-matrix in #11404
[Hi-Dream LoRA] fix bug in validation by @linoytsaban in #11439
Fixing missing provider options argument by @urpetkov-amd in #11397
Set LANCZOS as the default interpolation for image resizing in ControlNet training by @YoulunPeng in #11449
Raise warning instead of error for block offloading with streams by @a-r-r-o-w in #11425
enable marigold_intrinsics cases on XPU by @yao-matrix in #11445
torch.compile fullgraph compatibility for Hunyuan Video by @a-r-r-o-w in #11457
enable consistency test cases on XPU, all passed by @yao-matrix in #11446
enable unidiffuser test cases on xpu by @yao-matrix in #11444
Add generic support for Intel Gaudi accelerator (hpu device) by @dsocek in #11328
Add StableDiffusion3InstructPix2PixPipeline by @xduzhangjiayu in #11378
make safe diffusion test cases pass on XPU and A100 by @yao-matrix in #11458
[test_models_transformer_hunyuan_video] help us test torch.compile() for impactful models by @tongyu0924 in #11431
Add LANCZOS as default interplotation mode. by @Va16hav07 in #11463
make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu by @yao-matrix in #11461
[WAN] fix recompilation issues by @sayakpaul in #11475
Fix typos in docs and comments by @co63oc in #11416
[tests] xfail recent pipeline tests for specific methods. by @sayakpaul in #11469
cache packages_distributions by @vladmandic in #11453
[docs] Memory optims by @stevhliu in #11385
[docs] Adapters by @stevhliu in #11331
[train_dreambooth_lora_sdxl_advanced] Add LANCZOS as the default interpolation mode for image resizing by @yuanjua in #11471
[train_dreambooth_lora_flux_advanced] Add LANCZOS as the default interpolation mode for image resizing by @ysurs in #11472
enable semantic diffusion and stable diffusion panorama cases on XPU by @yao-matrix in #11459
[Feature] Implement tiled VAE encoding/decoding for Wan model. by @c8ef in #11414
[train_text_to_image_sdxl]Add LANCZOS as default interpolation mode for image resizing by @ParagEkbote in #11455
[train_dreambooth_lora_sdxl] Add --image_interpolation_mode option for image resizing (default to lanczos) by @MinJu-Ha in #11490
[train_dreambooth_lora_lumina2] Add LANCZOS as the default interpolation mode for image resizing by @cjfghk5697 in #11491
[training] feat: enable quantization for hidream lora training. by @sayakpaul in #11494
Set LANCZOS as the default interpolation method for image resizing. by @yijun-lee in #11492
Update training script for txt to img sdxl with lora supp with new interpolation. by @RogerSinghChugh in #11496
Fix torchao docs typo for fp8 granular quantization by @a-r-r-o-w in #11473
Update setup.py to pin min version of peft by @sayakpaul in #11502
update dep table. by @sayakpaul in #11504
[LoRA] use removeprefix to preserve sanity. by @sayakpaul in #11493
Hunyuan Video Framepack by @a-r-r-o-w in #11428
enable lora cases on XPU by @yao-matrix in #11506
[lora_conversion] Enhance key handling for OneTrainer components in LORA conversion utility by @iamwavecut in #11441)
[docs] minor updates to bitsandbytes docs. by @sayakpaul in #11509
Cosmos by @a-r-r-o-w in #10660
clean up the Init for stable_diffusion by @yiyixuxu in #11500
fix audioldm by @sayakpaul (direct commit on v0.34.0-release)
Revert "fix audioldm" by @sayakpaul (direct commit on v0.34.0-release)
[LoRA] make lora alpha and dropout configurable by @linoytsaban in #11467
Add cross attention type for Sana-Sprint training in diffusers. by @scxue in #11514
Conditionally import torchvision in Cosmos transformer by @a-r-r-o-w in #11524
[tests] fix audioldm2 for transformers main. by @sayakpaul in #11522
feat: pipeline-level quantization config by @sayakpaul in #11130
[Tests] Enable more general testing for torch.compile() with LoRA hotswapping by @sayakpaul in #11322
[LoRA] support non-diffusers hidream loras by @sayakpaul in #11532
enable 7 cases on XPU by @yao-matrix in #11503
[LTXPipeline] Update latents dtype to match VAE dtype by @james-p-xu in #11533
enable dit integration cases on xpu by @yao-matrix in #11523
enable print_env on xpu by @yao-matrix in #11507
Change Framepack transformer layer initialization order by @a-r-r-o-w in #11535
[tests] add tests for framepack transformer model. by @sayakpaul in #11520
Hunyuan Video Framepack F1 by @a-r-r-o-w in #11534
enable several pipeline integration tests on XPU by @yao-matrix in #11526
[test_models_transformer_ltx.py] help us test torch.compile() for impactful models by @cjfghk5697 in #11512
Add VisualCloze by @lzyhha in #11377
Fix typo in train_diffusion_orpo_sdxl_lora_wds.py by @Meeex2 in #11541
fix: remove torch_dtype="auto" option from docstrings by @johannaSommer in #11513
[train_dreambooth.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @kghamilton89 in #11239
[LoRA] small change to support Hunyuan LoRA Loading for FramePack by @linoytsaban in #11546
LTX Video 0.9.7 by @a-r-r-o-w in #11516
[tests] Enable testing for HiDream transformer by @sayakpaul in #11478
Update pipeline_flux_img2img.py to add missing vae_slicing and vae_tiling calls. by @Meatfucker in #11545
Fix deprecation warnings in test_ltx_image2video.py by @AChowdhury1211 in #11538
[tests] Add torch.compile test for UNet2DConditionModel by @olccihyeon in #11537
[Single File] GGUF/Single File Support for HiDream by @DN6 in #11550
[gguf] Refactor torch_function to avoid unnecessary computation by @anijain2305 in #11551
[tests] add tests for combining layerwise upcasting and groupoffloading. by @sayakpaul in #11558
[docs] Regional compilation docs by @sayakpaul in #11556
enhance value guard of _device_agnostic_dispatch by @yao-matrix in #11553
Doc update by @Player256 in #11531
Revert error to warning when loading LoRA from repo with multiple weights by @apolinario in #11568
[docs] tip for group offloding + quantization by @sayakpaul in #11576
[LoRA] support non-diffusers LTX-Video loras by @linoytsaban in #11572
[WIP][LoRA] start supporting kijai wan lora. by @sayakpaul in #11579
[Single File] Fix loading for LTX 0.9.7 transformer by @DN6 in #11578
Use HF Papers by @qgallouedec in #11567
LTX 0.9.7-distilled; documentation improvements by @a-r-r-o-w in #11571
[LoRA] kijai wan lora support for I2V by @linoytsaban in #11588
docs: fix invalid links by @osrm in #11505
[docs] Remove fast diffusion tutorial by @stevhliu in #11583
RegionalPrompting: Inherit from Stable Diffusion by @b-sai in #11525
[chore] allow string device to be passed to randn_tensor. by @sayakpaul in #11559
Type annotation fix by @DN6 in #11597
[LoRA] minor fix for load_lora_weights() for Flux and a test by @sayakpaul in #11595
Update Intel Gaudi doc by @regisss in #11479
enable pipeline test cases on xpu by @yao-matrix in #11527
[Feature] AutoModel can load components using model_index.json by @ishan-modi in #11401
[docs] Pipeline-level quantization by @stevhliu in #11604
Fix bug when variant and safetensor file does not match by @kaixuanliu in #11587
[tests] Changes to the torch.compile() CI and tests by @sayakpaul in #11508
Fix mixed variant downloading by @DN6 in #11611
fix security issue in build docker ci by @sayakpaul in #11614
Make group offloading compatible with torch.compile() by @sayakpaul in #11605
[training docs] smol update to README files by @linoytsaban in #11616
Adding NPU for get device function by @leisuzz in #11617
[LoRA] improve LoRA fusion tests by @sayakpaul in #11274
[Sana Sprint] add image-to-image pipeline by @linoytsaban in #11602
[CI] fix the filename for displaying failures in lora ci. by @sayakpaul in #11600
[docs] PyTorch 2.0 by @stevhliu in #11618
[textual_inversion_sdxl.py] fix lr scheduler steps count by @yuanjua in #11557
Fix wrong indent for examples of controlnet script by @Justin900429 in #11632
removing unnecessary else statement by @YanivDorGalron in #11624
enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed by @yao-matrix in #11620
Bug: Fixed Image 2 Image example by @vltmedia in #11619
typo fix in pipeline_flux.py by @YanivDorGalron in #11623
Fix typos in strings and comments by @co63oc in #11476
[docs] update torchao doc link by @sayakpaul in #11634
Use float32 RoPE freqs in Wan with MPS backends by @hvaara in #11643
[chore] misc changes in the bnb tests for consistency. by @sayakpaul in #11355
[tests] chore: rename lora model-level tests. by @sayakpaul in #11481
[docs] Caching methods by @stevhliu in #11625
[docs] Model cards by @stevhliu in #11112
[CI] Some improvements to Nightly reports summaries by @DN6 in #11166
[chore] bring PipelineQuantizationConfig at the top of the import chain. by @sayakpaul in #11656
[examples] flux-control: use num_training_steps_for_scheduler by @Markus-Pobitzer in #11662
use deterministic to get stable result by @jiqing-feng in #11663
[tests] add test for torch.compile + group offloading by @sayakpaul in #11670
Wan VACE by @a-r-r-o-w in #11582
fixed axes_dims_rope init (huggingface#11641) by @sofinvalery in #11678
[tests] Fix how compiler mixin classes are used by @sayakpaul in #11680
Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process by @DN6 in #11596
Add community class StableDiffusionXL_T5Pipeline by @ppbrown in #11626
Update pipeline_flux_inpaint.py to fix padding_mask_crop returning only the inpainted area by @Meatfucker in #11658
Allow remote code repo names to contain "." by @akasharidas in #11652
[LoRA] support Flux Control LoRA with bnb 8bit. by @sayakpaul in #11655
[Wan] Fix VAE sampling mode in WanVideoToVideoPipeline by @tolgacangoz in #11639
enable torchao test cases on XPU and switch to device agnostic APIs for test cases by @yao-matrix in #11654
[tests] tests for compilation + quantization (bnb) by @sayakpaul in #11672
[tests] model-level device_map clarifications by @sayakpaul in #11681
Improve Wan docstrings by @a-r-r-o-w in #11689
Set _torch_version to N/A if torch is disabled. by @rasmi in #11645
Avoid DtoH sync from access of nonzero() item in scheduler by @jbschlosser in #11696
Apply Occam's Razor in position embedding calculation by @tolgacangoz in #11562
[docs] add compilation bits to the bitsandbytes docs. by @sayakpaul in #11693
swap out token for style bot. by @sayakpaul in #11701
[docs] mention fp8 benefits on supported hardware. by @sayakpaul in #11699
Support Wan AccVideo lora by @a-r-r-o-w in #11704
[LoRA] parse metadata from LoRA and save metadata by @sayakpaul in #11324
Cosmos Predict2 by @a-r-r-o-w in #11695
Chroma Pipeline by @Ednaordinary in #11698
[LoRA ]fix flux lora loader when return_metadata is true for non-diffusers by @sayakpaul in #11716
[training] show how metadata stuff should be incorporated in training scripts. by @sayakpaul in #11707
Fix misleading comment by @carlthome in #11722
Add Pruna optimization framework documentation by @davidberenstein1957 in #11688
Support more Wan loras (VACE) by @a-r-r-o-w in #11726
[LoRA training] update metadata use for lora alpha + README by @linoytsaban in #11723
⚡️ Speed up method AutoencoderKLWan.clear_cache by 886% by @misrasaurabh1 in #11665
[training] add ds support to lora hidream by @leisuzz in #11737
[tests] device_map tests for all models. by @sayakpaul in #11708
[chore] change to 2025 licensing for remaining by @sayakpaul in #11741
Chroma Follow Up by @DN6 in #11725
[Quantizers] add is_compileable property to quantizers. by @sayakpaul in #11736
Update more licenses to 2025 by @a-r-r-o-w in #11746
Add missing HiDream license by @a-r-r-o-w in #11747
Bump urllib3 from 2.2.3 to 2.5.0 in /examples/server by @dependabot[bot] in #11748
[LoRA] refactor lora loading at the model-level by @sayakpaul in #11719
[CI] Fix WAN VACE tests by @DN6 in #11757
[CI] Fix SANA tests by @DN6 in #11756
Fix HiDream pipeline test module by @DN6 in #11754
make group offloading work with disk/nvme transfers by @sayakpaul in #11682
Update Chroma Docs by @DN6 in #11753
fix invalid component handling behaviour in PipelineQuantizationConfig by @sayakpaul in #11750
Fix failing cpu offload test for LTX Latent Upscale by @DN6 in #11755
[docs] Quantization + torch.compile + offloading by @stevhliu in #11703
[docs] device_map by @stevhliu in #11711
[docs] LoRA scale scheduling by @stevhliu in #11727
Fix dimensionalities in apply_rotary_emb functions' comments by @tolgacangoz in #11717
enable deterministic in bnb 4 bit tests by @jiqing-feng in #11738
enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU by @yao-matrix in #11671
[tests] properly skip tests instead of return by @sayakpaul in #11771
[CI] Skip ONNX Upscale tests by @DN6 in #11774
[Wan] Fix mask padding in Wan VACE pipeline. by @bennyguo in #11778
Add --lora_alpha and metadata handling to train_dreambooth_lora_sana.py by @imbr92 in #11744
[docs] minor cleanups in the lora docs. by @sayakpaul in #11770
[lora] only remove hooks that we add back by @yiyixuxu in #11768
[tests] Fix HunyuanVideo Framepack device tests by @a-r-r-o-w in #11789
[chore] raise as early as possible in group offloading by @sayakpaul in #11792
[tests] Fix group offloading and layerwise casting test interaction by @a-r-r-o-w in #11796
guard omnigen processor. by @sayakpaul in #11799
Release: v0.34.0 by @sayakpaul (direct commit on v0.34.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@yao-matrix
- fix test_vanilla_funetuning failure on XPU and A100 (#11263)
- make test_stable_diffusion_inpaint_fp16 pass on XPU (#11264)
- make test_dict_tuple_outputs_equivalent pass on XPU (#11265)
- make test_instant_style_multiple_masks pass on XPU (#11266)
- make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU (#11308)
- make test_stable_diffusion_karras_sigmas pass on XPU (#11310)
- fix CPU offloading related fail cases on XPU (#11288)
- enable 2 test cases on XPU (#11332)
- enable group_offload cases and quanto cases on XPU (#11405)
- enable test_layerwise_casting_memory cases on XPU (#11406)
- enable 28 GGUF test cases on XPU (#11404)
- enable marigold_intrinsics cases on XPU (#11445)
- enable consistency test cases on XPU, all passed (#11446)
- enable unidiffuser test cases on xpu (#11444)
- make safe diffusion test cases pass on XPU and A100 (#11458)
- make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu (#11461)
- enable semantic diffusion and stable diffusion panorama cases on XPU (#11459)
- enable lora cases on XPU (#11506)
- enable 7 cases on XPU (#11503)
- enable dit integration cases on xpu (#11523)
- enable print_env on xpu (#11507)
- enable several pipeline integration tests on XPU (#11526)
- enhance value guard of _device_agnostic_dispatch (#11553)
- enable pipeline test cases on xpu (#11527)
- enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed (#11620)
- enable torchao test cases on XPU and switch to device agnostic APIs for test cases (#11654)
- enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU (#11671)
@hlky
- Fix LTX 0.9.5 single file (#11271)
- HiDream Image (#11231)
- Use float32 on mps or npu in transformer_hidream_image's rope (#11316)
- Fix vae.Decoder prev_output_channel (#11280)
@quickjkee
- flow matching lcm scheduler (#11170)
@ishan-modi
- [ControlNet] Adds controlnet for SanaTransformer (#11040)
- [BUG] fixed _toctree.yml alphabetical ordering (#11277)
- [BUG] fixes in kadinsky pipeline (#11080)
- [Refactor] Minor Improvement for import utils (#11161)
- [Feature] Added Xlab Controlnet support (#11249)
- [BUG] fixed WAN docstring (#11226)
- [Feature] AutoModel can load components using model_index.json (#11401)
@linoytsaban
- [HiDream] code example (#11317)
- [Flux LoRAs] fix lr scheduler bug in distributed scenarios (#11242)
- [LoRA] add LoRA support to HiDream and fine-tuning script (#11281)
- [HiDream LoRA] optimizations + small updates (#11381)
- [Hi-Dream LoRA] fix bug in validation (#11439)
- [LoRA] make lora alpha and dropout configurable (#11467)
- [LoRA] small change to support Hunyuan LoRA Loading for FramePack (#11546)
- [LoRA] support non-diffusers LTX-Video loras (#11572)
- [LoRA] kijai wan lora support for I2V (#11588)
- [training docs] smol update to README files (#11616)
- [Sana Sprint] add image-to-image pipeline (#11602)
- [LoRA training] update metadata use for lora alpha + README (#11723)
@hameerabbasi
- [LoRA] Add LoRA support to AuraFlow (#10216)
@DN6
- Fix Hunyuan I2V for transformers>4.47.1 (#11293)
- Hunyuan I2V fast tests fix (#11341)
- [Single File] GGUF/Single File Support for HiDream (#11550)
- [Single File] Fix loading for LTX 0.9.7 transformer (#11578)
- Type annotation fix (#11597)
- Fix mixed variant downloading (#11611)
- [CI] Some improvements to Nightly reports summaries (#11166)
- Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process (#11596)
- Chroma Follow Up (#11725)
- [CI] Fix WAN VACE tests (#11757)
- [CI] Fix SANA tests (#11756)
- Fix HiDream pipeline test module (#11754)
- Update Chroma Docs (#11753)
- Fix failing cpu offload test for LTX Latent Upscale (#11755)
- [CI] Skip ONNX Upscale tests (#11774)
@yiyixuxu
- [Hi Dream] follow-up (#11296)
- support Wan-FLF2V (#11353)
- update output for Hidream transformer (#11366)
- [Wan2.1-FLF2V] update conversion script (#11365)
- [HiDream] move deprecation to 0.35.0 (#11384)
- clean up the Init for stable_diffusion (#11500)
- [lora] only remove hooks that we add back (#11768)
@Teriks
- Kolors additional pipelines, community contrib (#11372)
@co63oc
- Fix typos in strings and comments (#11407)
- Fix typos in docs and comments (#11416)
- Fix typos in strings and comments (#11476)
@xduzhangjiayu
- Add StableDiffusion3InstructPix2PixPipeline (#11378)
@scxue
- Add cross attention type for Sana-Sprint training in diffusers. (#11514)
@lzyhha
- Add VisualCloze (#11377)
@b-sai
- RegionalPrompting: Inherit from Stable Diffusion (#11525)
@Ednaordinary
- Chroma Pipeline (#11698)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more

📹 New video generation pipelines

Wan VACE

Cosmos Predict2 Video2World

LTX 0.9.7 and Distilled

Hunyuan Video Framepack and F1

FusionX

AccVideo and CausVid (only LoRAs)

🌠 New image generation pipelines

Cosmos Predict2 Text2Image

Chroma

VisualCloze

Better `torch.compile` support

PipelineQuantizationConfig

Group offloading with disk

LoRA metadata parsing

New training scripts

Updates on educational materials on quantization

All commits

Significant community contributions

Contributors

Uh oh!

Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more

📹 New video generation pipelines

Wan VACE

Cosmos Predict2 Video2World

LTX 0.9.7 and Distilled

Hunyuan Video Framepack and F1

FusionX

AccVideo and CausVid (only LoRAs)

🌠 New image generation pipelines

Cosmos Predict2 Text2Image

Chroma

VisualCloze

Better torch.compile support

PipelineQuantizationConfig

Group offloading with disk

LoRA metadata parsing

New training scripts

Updates on educational materials on quantization

All commits

Significant community contributions

Contributors

Uh oh!

Better `torch.compile` support