Diffusers current/future #11403
Replies: 2 comments 2 replies
-
I will chime in for the LoRA bits. I think we're also fast in supporting non-diffusers LoRAs and that is always a priority for us. If we check the timeline of the PRs we have opened after we got requests for supporting them, it would suggest our dedication towards making them a priority. Regarding the Hub upload ask, that was my preference. None of my teammates do that, I think. I still think it's a fair ask and I will stick to my point. Even when a contributor doesn't provide the Hub URL to the LoRA, we will just do it ourselves (and we have done it multiple time; will continue in doing so). Not providing a minimal reproducible snippet is a no-no for me and I will not indulge myself into further explaining it or any kind of other arguments whatsoever. If I were reluctant, I would not have tried figuring out a reproducible snippet myself (reference). For consumer GPUs, we're on it. In the current release, we have shipped a ton of memory optimization-related stuff, specifically targeting consumer GPUs. We agree that the docs are currently not great and #11385 is a step forward towards that. We are also thinking of "auto" offloading strategies based on a given accelerator specification. More on that in the coming days. |
Beta Was this translation helpful? Give feedback.
-
Thank you for starting this discussion @vladmandic. I had a couple of queries about the diffusers community pipelines. Firstly, does it make sense to deprecate the older/broken pipelines which have not been updated since the changes introduced in #6984? Secondly, the newer community pipelines have not been added with respect to the API design of pipe = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", custom_pipeline="filename_in_the_community_folder") Could they be refactored as well? |
Beta Was this translation helpful? Give feedback.
-
Starting a brainstorming thread on ideas for longer-term priorities for
diffusers
...Use of LoRA is very near top of priority for any user
Right now, diffusers are pretty fast supporting LoRAs trained by diffusers for any new models
But 95%+ of loras are NOT trained by diffusers and tools like
onetrainer
,kohya
,ai-toolkit
,simpletrainer
(in no particular order)are much more widely adopted
Also, like-it-or-not, CivitAI is de-facto standard for LoRA distribution so asking for LoRA to be uploaded to
HF hub just so it can be tested is a no-no
Right now, out-of-the-box, diffusers support very small part of LoRAs and its highly impacting its userbase
It should become a priority, not an afterthought
There was a big initiative a year ago to make it happen and it resulted in a lot of refactoring
But guess what? We're back in a position where a lot of new models are supported only using
from_pretrained
with HF URLAgain, like-it-or-not, large majority of users prefer single-file safetensors
And I can't say I truly blame them as HF folder-style is non-portable
E.g. if I have a model, moving it somewhere is a nightmare (and frequently breaks due to hf libs usage of symlinks)
Also, many users already have single-file downloaded for use in other apps such as ComfyUI
But if they want to use it in diffusers, guess what? download again
Almost all new models are large and diffusers is fast to bring support
But it takes very deep know-how to make it work on normal consumer GPUs as that is never part of original dev/test cycle
E.g. even advanced users are left wondering how quantize each component separately
and which quants are even allowed for each component
And make recommendations so users are not left starting at dry docs which don't help them
Adding new support for new model is me-too, but what is the value-add of diffusers other than its all in the same codebase?
Initiatives as unified guiders (currently in draft) are very welcome as they bring such value add
Remote VAE is another very positive initiative
I'd even suggest going deeper - recently I've implemeted Nunchaku engine
with its SVDQuant which has native 4bit execution via custom CUDA kernels
Result? FLUX.1 works 3-5x faster. Yes, that's actual 300-500%!
But although the approach in general is applicable on any DiT-based model, that team simply cannot do that
as they do not have resources - but this is where diffusers team could step in!
Nitpick the best-of-the-best solutions and actually port them to most popular pipelines
Anyhow, this is not intended to be exhaustive list - this is pretty much just a conversation starter
Any feedback and/or ideas are more than welcome
cc: @a-r-r-o-w @yiyixuxu @sayakpaul
Beta Was this translation helpful? Give feedback.
All reactions