⚗️ DistillDIFT: Distillation of Diffusion Features for Semantic Correspondence

Frank Fundel · Johannes Schusterbauer · Vincent Tao Hu · Björn Ommer

CompVis @ LMU Munich, MCML

WACV 2025

TL;DR

We present DistillDIFT, a highly efficient approach to semantic correspondence that delivers state-of-the-art performance with significantly reduced computational cost. Unlike traditional methods that combine multiple large generative models, DistillDIFT uses a novel distillation technique to unify the strengths of two vision foundation models into a single, streamlined model. By integrating 3D data without requiring human annotations, DistillDIFT further improves accuracy.

Overall, our empirical results demonstrate that our distilled model with 3D data augmentation achieves superior performance to current state-of-the-art methods while significantly reducing computational load and enhancing practicality for real-world applications, such as semantic video correspondence.

🛠️ Setup

This setup was tested with Ubuntu 22.04.4 LTS, CUDA Version: 12.2, and Python 3.9.20.

First, clone the github repo...

git clone [email protected]:CompVis/distilldift.git
cd DistillDIFT

🔬 Evaluation on SPair-71K

Our evaluation pipeline for SPair-71K is based on Telling-Left-From-Right for better comparability.

Follow their environment setup and data preparation, don't forget to first:

cd eval

And then run the evaluation script via

bash eval_distilldift.sh

🏋️ Training

First use

cd train

Then you have either the option to setup a virtual environment and install all required packages with pip via

pip install -r requirements.txt

or if you prefer to use conda create the conda environment via

conda env create -f environment.yaml

Download the COCO dataset and embed the images (for unsupervised training) via

bash datasets/download_coco.sh
python embed.py --dataset_name COCO

And run the training via

Unsupervised Distillation

accelerate launch --multi_gpu --num_processes 4 train.py distilled_us --dataset_name COCO --use_cache

Weakly Supervised Distillation

accelerate launch --multi_gpu --num_processes 4 train.py distilled_ws --dataset_name SPair-71k --use_cache

Supervised Training

accelerate launch --multi_gpu --num_processes 4 train.py distilled_s --dataset_name SPair-71k --use_cache

Refinement using CO3D

Follow the official instructions to download the CO3D dataset and then prepare the CO3D dataset via

python datasets/create_co3d.py

And run the training via

accelerate launch --multi_gpu --num_processes 4 train.py distilled_s --dataset_name CO3D --use_cache

🎓 Citation

Please cite our paper:

@article{fundel2025distilldift,
  author    = {Frank Fundel and Johannes Schusterbauer and Vincent Tao Hu and Björn Ommer},
  title     = {Distillation of Diffusion Features for Semantic Correspondence},
  journal   = {WACV},
  year      = {2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
eval		eval
train		train
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⚗️ DistillDIFT: Distillation of Diffusion Features for Semantic Correspondence

TL;DR

🛠️ Setup

🔬 Evaluation on SPair-71K

🏋️ Training

Refinement using CO3D

🎓 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

CompVis/distilldift

Folders and files

Latest commit

History

Repository files navigation

⚗️ DistillDIFT: Distillation of Diffusion Features for Semantic Correspondence

TL;DR

🛠️ Setup

🔬 Evaluation on SPair-71K

🏋️ Training

Refinement using CO3D

🎓 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages