`Text-to-Image`

Introduction

Literature

(StyleCLIP)-Text-Driven Manipulation of StyleGAN Imagery
[Arxiv 2021] (TAU) Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski

Click to expand

Summary

They combine the generative power of StyleGAN with the rich joint vision-language representation learned by CLIP. They leverage these tech to develop a text-based interface for manipulating generated and real images that does not require manual effort (manual annotation). Their method mainly control the latent spaces of StyleGAN.

Details

They explore three ways for text-driven image manipulation:

latent optimization (a given latent code of an image is optimized by minimizing a loss computed in CLIP space). $$ \underset{w \in \mathcal{W}+}{\arg \min } D_{\mathrm{CLIP}}(G(w), t)+\lambda_{\mathrm{L} 2}\left|w-w_{s}\right|{2}+\lambda{\mathrm{ID}} \mathcal{L}_{\mathrm{ID}}(w) $$
latent mapper (mapping network is trained to infer a manipulation step in latent space).
global direction (transforms a given text prompt into an input agnostic mapping direction).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes.md

Notes.md

`Text-to-Image`

Introduction

Literature

Files

Notes.md

Latest commit

History

Notes.md

File metadata and controls

Text-to-Image

Introduction

Literature

`Text-to-Image`