Skip to content

Latest commit

 

History

History
47 lines (21 loc) · 1.37 KB

File metadata and controls

47 lines (21 loc) · 1.37 KB

Text-to-Image

Introduction

Literature

(StyleCLIP)-Text-Driven Manipulation of StyleGAN Imagery
[Arxiv 2021] (TAU) Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski

Click to expand

Summary

They combine the generative power of StyleGAN with the rich joint vision-language representation learned by CLIP. They leverage these tech to develop a text-based interface for manipulating generated and real images that does not require manual effort (manual annotation). Their method mainly control the latent spaces of StyleGAN.

Details

They explore three ways for text-driven image manipulation:

  • latent optimization (a given latent code of an image is optimized by minimizing a loss computed in CLIP space). $$ \underset{w \in \mathcal{W}+}{\arg \min } D_{\mathrm{CLIP}}(G(w), t)+\lambda_{\mathrm{L} 2}\left|w-w_{s}\right|{2}+\lambda{\mathrm{ID}} \mathcal{L}_{\mathrm{ID}}(w) $$

  • latent mapper (mapping network is trained to infer a manipulation step in latent space).

  • global direction (transforms a given text prompt into an input agnostic mapping direction).