Diffusers current/future #11403

vladmandic · 2025-04-23T23:36:37Z

vladmandic
Apr 23, 2025

Starting a brainstorming thread on ideas for longer-term priorities for diffusers...

Better LoRA support
Use of LoRA is very near top of priority for any user
Right now, diffusers are pretty fast supporting LoRAs trained by diffusers for any new models
But 95%+ of loras are NOT trained by diffusers and tools like onetrainer, kohya, ai-toolkit, simpletrainer (in no particular order)
are much more widely adopted
Also, like-it-or-not, CivitAI is de-facto standard for LoRA distribution so asking for LoRA to be uploaded to
HF hub just so it can be tested is a no-no
Right now, out-of-the-box, diffusers support very small part of LoRAs and its highly impacting its userbase
It should become a priority, not an afterthought
Support single-file everywhere
There was a big initiative a year ago to make it happen and it resulted in a lot of refactoring
But guess what? We're back in a position where a lot of new models are supported only using from_pretrained with HF URL
Again, like-it-or-not, large majority of users prefer single-file safetensors
And I can't say I truly blame them as HF folder-style is non-portable
E.g. if I have a model, moving it somewhere is a nightmare (and frequently breaks due to hf libs usage of symlinks)
Also, many users already have single-file downloaded for use in other apps such as ComfyUI
But if they want to use it in diffusers, guess what? download again
Focus on consumer GPUs
Almost all new models are large and diffusers is fast to bring support
But it takes very deep know-how to make it work on normal consumer GPUs as that is never part of original dev/test cycle
E.g. even advanced users are left wondering how quantize each component separately
and which quants are even allowed for each component
And make recommendations so users are not left starting at dry docs which don't help them
Create differentiators
Adding new support for new model is me-too, but what is the value-add of diffusers other than its all in the same codebase?
Initiatives as unified guiders (currently in draft) are very welcome as they bring such value add
Remote VAE is another very positive initiative
I'd even suggest going deeper - recently I've implemeted Nunchaku engine
with its SVDQuant which has native 4bit execution via custom CUDA kernels
Result? FLUX.1 works 3-5x faster. Yes, that's actual 300-500%!
But although the approach in general is applicable on any DiT-based model, that team simply cannot do that
as they do not have resources - but this is where diffusers team could step in!
Nitpick the best-of-the-best solutions and actually port them to most popular pipelines

Anyhow, this is not intended to be exhaustive list - this is pretty much just a conversation starter
Any feedback and/or ideas are more than welcome

cc: @a-r-r-o-w @yiyixuxu @sayakpaul

sayakpaul · 2025-04-24T01:37:46Z

sayakpaul
Apr 24, 2025
Maintainer

I will chime in for the LoRA bits.

I think we're also fast in supporting non-diffusers LoRAs and that is always a priority for us. If we check the timeline of the PRs we have opened after we got requests for supporting them, it would suggest our dedication towards making them a priority.

Regarding the Hub upload ask, that was my preference. None of my teammates do that, I think. I still think it's a fair ask and I will stick to my point. Even when a contributor doesn't provide the Hub URL to the LoRA, we will just do it ourselves (and we have done it multiple time; will continue in doing so).

Not providing a minimal reproducible snippet is a no-no for me and I will not indulge myself into further explaining it or any kind of other arguments whatsoever. If I were reluctant, I would not have tried figuring out a reproducible snippet myself (reference).

For consumer GPUs, we're on it. In the current release, we have shipped a ton of memory optimization-related stuff, specifically targeting consumer GPUs. We agree that the docs are currently not great and #11385 is a step forward towards that. We are also thinking of "auto" offloading strategies based on a given accelerator specification. More on that in the coming days.

1 reply

vladmandic Apr 24, 2025
Author

my message was not an attack, its simply my view of the current situation.

re: lora - if you say we should open separate feature request for every lora that doesn't work, ok, we'll dedicate time for that - but my ask is opposite of that - to be more proactive as we know what user expectation is ahead of time. this is not about code snippets or feature requests or uploads to hub - its about being proactive when we know in advance where the biggest demand will be. and how do we know that? by seeing how many loras exist for any specific model on civitai - that is a very simple metrics.

re: consumer gpu - there is tons shipped (and i really do like quant support in diffusers).
but its difficult to use without any clear guidance. right now, every module needs to be loaded separately and then combined into pipeline and there is no note which modules and/or gpus support which quant types. i can deal with that on my side, but i wouldn't expect that from normal user.

ParagEkbote · 2025-04-25T17:17:56Z

ParagEkbote
Apr 25, 2025

Thank you for starting this discussion @vladmandic.

I had a couple of queries about the diffusers community pipelines. Firstly, does it make sense to deprecate the older/broken pipelines which have not been updated since the changes introduced in #6984?

Secondly, the newer community pipelines have not been added with respect to the API design of

pipe = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", custom_pipeline="filename_in_the_community_folder")

Could they be refactored as well?

1 reply

vladmandic Apr 29, 2025
Author

i would say at least diffusers need slow-test for community pipelines to see which one is working and which one is not and split the list into working/nonworking sections. then maybe community members can pick-and-choose which ones to fix as right now there is no way to know what's good and what's not.

ITEvo · 2025-05-13T08:48:37Z

ITEvo
May 13, 2025

This is my own, and exclusive personal opinion... Coming from my personal experience, so take it as what it is.
The first priority would be injecting in a forced way the support for long token context (positive\negatives) in order to avoid people to lose mind and time struggling with them, ending up loading a random custom pipeline (coming from it.)
I guess another priority would be injecting all the basic feature that modern UI have, like, to say one, embeddings support (without losing also sanity for injecting it, since there are no documentation anywhere.)
I mean, there are a lot of things to fix and make stable before talking about future. LOL

1 reply

asomoza May 13, 2025
Maintainer

For long prompts we recommend using sd_embeds and there's documentation about how to use textual inversion as a top level in the docs.

Don't really know where to answer since you're opening/posting the same in multiple places, so I'll use this one. As a recommendation, do not use multiple places to post the same issue and also don't post on closed ones since we probably will miss it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Diffusers current/future #11403

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Diffusers current/future #11403

Uh oh!

Uh oh!

vladmandic Apr 23, 2025

Replies: 3 comments · 3 replies

Uh oh!

sayakpaul Apr 24, 2025 Maintainer

Uh oh!

vladmandic Apr 24, 2025 Author

Uh oh!

ParagEkbote Apr 25, 2025

Uh oh!

vladmandic Apr 29, 2025 Author

Uh oh!

ITEvo May 13, 2025

Uh oh!

asomoza May 13, 2025 Maintainer

vladmandic
Apr 23, 2025

Replies: 3 comments 3 replies

sayakpaul
Apr 24, 2025
Maintainer

vladmandic Apr 24, 2025
Author

ParagEkbote
Apr 25, 2025

vladmandic Apr 29, 2025
Author

ITEvo
May 13, 2025

asomoza May 13, 2025
Maintainer