🎉 Paper accepted at CVPR 2025! 🎉
Yassir Bendou1 ,
Amine Ouasfi2 ,
Vincent Gripon1 ,
Adnane Boukhayma2 ,
1IMT Atlantique
2INRIA
Create a conda environment and install dependencies:
conda create -n h2b python=3.9
conda activate h2b
pip install -r requirements.txt
If you prefer to use uv:
uv venv --python 3.9
source .venv/bin/activate
uv pip install -r requirements.txt
Follow DATASET.md to install ImageNet and other datasets referring to CoOp.
The running configurations can be modified in configs
.
For few-shot classification:
python main.py --method ProKeR --shots 1 2 4 8 16 --dataset caltech101 --augment-epoch 10
If GPU memory is saturated, consider using fewer data augmentations --augment-epoch
Multiple methods are implemented:
Name | Details |
---|---|
ZeroShot | CLIP |
TIP | Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling |
GDA | A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation |
CLAP | A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models |
ProKeR | ProKeR (ours) |
ProKeR_CLAP_joint | ProKeR (ours) + CLAP |
This repo benefits from Tip-Adapter, CoOp, and GDA.
@article{ProKeR,
title={A Kernel Perspective on Training-Free Few-Shot Adaptation of Large Vision-Language Models},
author={Bendou, Yassir and Ouasfi, Amine and Gripon, Vincent and Boukhayma, Adnane}
journal = {arXiv preprint},
url = {https://arxiv.org/abs/2501.11175}
}
If you have any question, feel free to contact [email protected].