-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Add VisualCloze #11377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add VisualCloze #11377
Conversation
@lzyhha thanks for your contribution. Could you please add some code snippets and results to the thread? |
Cc: @asomoza as well for testing if possible. |
Hello, here are some test codes and their results: Model Card. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi, really nice and thank you for your work. Currently diffusers doesn't have |
Okay, I will make the necessary modifications. Additionally, I noticed that the call method is not functioning properly in the documentation. Could you please help check the cause? |
Co-authored-by: Álvaro Somoza <[email protected]>
Hello, we have removed einops from the code while ensuring the correctness of the results. @asomoza |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. I just added few minor comments.
I am unsure about self.denoise()
. On one hand I see its value but since it deviates from our usual pipeline implementations, I will defer the decision to the other reviewers.
Hello, we have made changes to the code based on your suggestions. @sayakpaul |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I left some further comments and I will let the other reviewers comment here.
@@ -89,6 +89,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an | |||
| [UniDiffuser](unidiffuser) | text2image, image2text, image variation, text variation, unconditional image generation, unconditional audio generation | | |||
| [Value-guided planning](value_guided_sampling) | value guided sampling | | |||
| [Wuerstchen](wuerstchen) | text2image | | |||
| [VisualCloze](visualcloze) | text2image, image2image, subject driven generation, inpainting, style transfer, image restoration, image editing, [depth,normal,edge,pose]2image, [depth,normal,edge,pose]-estimation, virtual try-on, image relighting | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @stevhliu do you think it's alright?
Co-authored-by: Sayak Paul <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the awesome PR, and congrats on the release of your work! Just some minor changes that are needed before we can proceed to merge
... # in-context examples | ||
... [ | ||
... load_image( | ||
... "https://github.com/lzyhha/VisualCloze/tree/main/examples/examples/5bf755ed9dbb9b3e223e7ba35232b06e/5bf755ed9dbb9b3e223e7ba35232b06e_depth-anything-v2_Large.jpg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you open a pull request to https://huggingface.co/datasets/huggingface/documentation-images/tree/main/diffusers and create a folder named visualcloze
containing all necessary assets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, I have created a PR at https://huggingface.co/datasets/huggingface/documentation-images/discussions/483.
After it is merged, I will update the file URLs in the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I've merged it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have updated the image URLs for the examples.
|
||
# Generate the target image latents by denoising the initial noise | ||
# using the provided prompts and guidance scale | ||
cloze_latents = self.denoise( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiyixuxu This is quite different from our usual pipeline design, but there is benefit to having it here to reduce duplicated code. Could you review this part as well?
@bot /style |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the PR looks good to merge! I'll run the example scripts to verify and try to merge by tomorrow
What does this PR do?
Add VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning, an in-context learning based universal image generation framework, along with corresponding tests and documentation.
Here are some test codes and their results: Model Card.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.