Skip to content

Add VisualCloze #11377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open

Add VisualCloze #11377

wants to merge 33 commits into from

Conversation

lzyhha
Copy link

@lzyhha lzyhha commented Apr 21, 2025

What does this PR do?

Add VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning, an in-context learning based universal image generation framework, along with corresponding tests and documentation.

Here are some test codes and their results: Model Card.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul
Copy link
Member

@lzyhha thanks for your contribution. Could you please add some code snippets and results to the thread?

@sayakpaul sayakpaul requested a review from a-r-r-o-w April 21, 2025 12:17
@sayakpaul
Copy link
Member

Cc: @asomoza as well for testing if possible.

@lzyhha
Copy link
Author

lzyhha commented Apr 21, 2025

@lzyhha thanks for your contribution. Could you please add some code snippets and results to the thread?

Hello, here are some test codes and their results: Model Card.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@asomoza
Copy link
Member

asomoza commented Apr 21, 2025

Hi, really nice and thank you for your work. Currently diffusers doesn't have einops as a dependency. Is it possible that you refactor all the rearrange calls to just use a torch equivalent without the need of external libraries?

@lzyhha
Copy link
Author

lzyhha commented Apr 21, 2025

Hi, really nice and thank you for your work. Currently diffusers doesn't have einops as a dependency. Is it possible that you refactor all the rearrange calls to just use a torch equivalent without the need of external libraries?

Okay, I will make the necessary modifications. Additionally, I noticed that the call method is not functioning properly in the documentation. Could you please help check the cause?

@lzyhha
Copy link
Author

lzyhha commented Apr 22, 2025

Hello, we have removed einops from the code while ensuring the correctness of the results. @asomoza

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I just added few minor comments.

I am unsure about self.denoise(). On one hand I see its value but since it deviates from our usual pipeline implementations, I will defer the decision to the other reviewers.

@lzyhha
Copy link
Author

lzyhha commented Apr 23, 2025

Hello, we have made changes to the code based on your suggestions. @sayakpaul

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I left some further comments and I will let the other reviewers comment here.

@@ -89,6 +89,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
| [UniDiffuser](unidiffuser) | text2image, image2text, image variation, text variation, unconditional image generation, unconditional audio generation |
| [Value-guided planning](value_guided_sampling) | value guided sampling |
| [Wuerstchen](wuerstchen) | text2image |
| [VisualCloze](visualcloze) | text2image, image2image, subject driven generation, inpainting, style transfer, image restoration, image editing, [depth,normal,edge,pose]2image, [depth,normal,edge,pose]-estimation, virtual try-on, image relighting |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @stevhliu do you think it's alright?

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the awesome PR, and congrats on the release of your work! Just some minor changes that are needed before we can proceed to merge

... # in-context examples
... [
... load_image(
... "https://github.com/lzyhha/VisualCloze/tree/main/examples/examples/5bf755ed9dbb9b3e223e7ba35232b06e/5bf755ed9dbb9b3e223e7ba35232b06e_depth-anything-v2_Large.jpg"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you open a pull request to https://huggingface.co/datasets/huggingface/documentation-images/tree/main/diffusers and create a folder named visualcloze containing all necessary assets?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I have created a PR at https://huggingface.co/datasets/huggingface/documentation-images/discussions/483.
After it is merged, I will update the file URLs in the documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I've merged it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have updated the image URLs for the examples.


# Generate the target image latents by denoising the initial noise
# using the provided prompts and guidance scale
cloze_latents = self.denoise(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiyixuxu This is quite different from our usual pipeline design, but there is benefit to having it here to reduce duplicated code. Could you review this part as well?

@a-r-r-o-w
Copy link
Member

@bot /style

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the PR looks good to merge! I'll run the example scripts to verify and try to merge by tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

5 participants