Skip to content

[CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Notifications You must be signed in to change notification settings

suisui-koubou/StableVITON

This branch is 9 commits behind rlawjdghek/StableVITON:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

4728ed0 · Mar 1, 2024

History

4 Commits
Feb 29, 2024
Feb 29, 2024
Mar 1, 2024
Feb 29, 2024
Feb 29, 2024
Feb 29, 2024
Feb 29, 2024
Feb 29, 2024
Feb 29, 2024
Feb 29, 2024
Feb 29, 2024
Feb 29, 2024
Feb 29, 2024
Feb 29, 2024
Feb 29, 2024
Feb 29, 2024

Repository files navigation

[CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

This repository is the official implementation of StableVITON

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
Jeongho Kim, Gyojung Gu, Minho Park, Sunghyun Park, Jaegul Choo

[arXiv Paper]  [Project Page

teaser 

TODO List

  • Inference code
  • Release model weights
  • Training code

Environments

git clone https://github.com/rlawjdghek/StableVITON
cd StableVITON

conda create --name StableVITON python=3.10 -y
conda activate StableVITON

# install packages
pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
pip install pytorch-lightning==1.5.0
pip install einops
pip install opencv-python==4.7.0.72
pip install matplotlib
pip install omegaconf
pip install albumentations
pip install transformers==4.33.2
pip install xformers==0.0.19
pip install triton==2.0.0
pip install open-clip-torch==2.19.0
pip install diffusers==0.20.2
pip install scipy==1.10.1
conda install -c anaconda ipython -y

Weights and Data

Our checkpoint on VITONHD have been released!
You can download the VITON-HD dataset from here.
For both training and inference, the following dataset structure is required:

train
|-- image
|-- image-densepose
|-- agnostic
|-- agnostic-mask
|-- cloth
|-- cloth_mask
|-- gt_cloth_warped_mask (for ATV loss)

test
|-- image
|-- image-densepose
|-- agnostic
|-- agnostic-mask
|-- cloth
|-- cloth_mask

Preprocessing

The VITON-HD dataset serves as a benchmark and provides an agnostic mask. However, you can attempt virtual try-on on arbitrary images using segmentation tools like SAM. Please note that for densepose, you should use the same densepose model as used in VITON-HD.

Inference

#### paired
CUDA_VISIBLE_DEVICES=4 python inference.py \
 --config_path ./configs/VITONHD.yaml \
 --batch_size 4 \
 --model_load_path <model weight path> \
 --save_dir <save directory>

#### unpaired
CUDA_VISIBLE_DEVICES=4 python inference.py \
 --config_path ./configs/VITONHD.yaml \
 --batch_size 4 \
 --model_load_path <model weight path> \
 --unpair \
 --save_dir <save directory>

#### paired repaint
CUDA_VISIBLE_DEVICES=4 python inference.py \
 --config_path ./configs/VITONHD.yaml \
 --batch_size 4 \
 --model_load_path <model weight path>t \
 --repaint \
 --save_dir <save directory>

#### unpaired repaint
CUDA_VISIBLE_DEVICES=4 python inference.py \
 --config_path ./configs/VITONHD.yaml \
 --batch_size 4 \
 --model_load_path <model weight path> \
 --unpair \
 --repaint \
 --save_dir <save directory>

You can also preserve the unmasked region by '--repaint' option.

Training

For VITON training, we increased the first block of U-Net from 9 to 13 channels (add zero conv) based on the Paint-by-Example (PBE) model. Therefore, you should download the modified checkpoint (named as 'VITONHD_PBE_pose.ckpt') from the Link and place it in the './ckpts/' folder first.

Additionally, for more refined person texture, we utilized a VAE fine-tuned on the VITONHD dataset. You should also download the checkpoint (named as VITONHD_VAE_finetuning.ckpt') from the Link and place it in the './ckpts/' folder.

### Base model training
CUDA_VISIBLE_DEVICES=3,4 python train.py \
 --config_name VITONHD \
 --transform_size shiftscale3 hflip \
 --transform_color hsv bright_contrast \
 --save_name Base_test

### ATV loss finetuning
CUDA_VISIBLE_DEVICES=5,6 python train.py \
 --config_name VITONHD \
 --transform_size shiftscale3 hflip \
 --transform_color hsv bright_contrast \
 --use_atv_loss \
 --resume_path <first stage model path> \
 --save_name ATVloss_test

Citation

If you find our work useful for your research, please cite us:

@artical{kim2023stableviton,
    title={StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On},
    author={Kim, Jeongho and Gu, Gyojung and Park, Minho and Park, Sunghyun and Choo, Jaegul},
    booktitle={arXiv preprint arxiv:2312.01725},
    year={2023}
}

Acknowledgements Sunghyun Park is the corresponding author.

License

Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).

About

[CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%