This repository contains the codebase of a series of projects on synthetic dataset generation with few-shot guidance.
- LoFT: LoRA-fused Dataset Generation with Few-shot Guidance, Arxiv.
- DataDream: Few-shot Guided Dataset Generation, in ECCV, 2024.
We use Stable-Diffusion-2-1-base as a base diffusion model.
Also, few-shot real data should be formed in the following way. Each data file should be located in the path PATH_TO_REAL_FEWSHOT/$DATASET/shot$N_SHOT_seed$FEWSHOT_SEED/$CLASS_NAME/$FILE
. The list of $CLASS_NAME
For each $DATASET
can be found in sd-finetune/util.py
file. For instance, when using a 16-shot setting, files should be located as follows:
📂 data
|_📂 real_train_fewshot
|_📂 imagenet
|_📂 shot16_seed0
|_📂 abacus
|_📄 n02666196_17944.JPEG
|_📄 n02666196_10754.JPEG
|_📄 n02666196_10341.JPEG
...
|_📄 n02666196_16649.JPEG
|_📂 clothes iron
|_📂 great white shark
|_📂 goldfish
|_📂 tench
...
|_📂 eurosat
|_📂 shot16_seed0
|_📂 AnnualCrop
|_📂 Forest
...
You can run LoFT, DataDream-class, and DataDream-dataset methods by following the process below.
- Install the necessary dependencies in
requirements.txt
. - Finetune diffusion model: Follow the instructions in the
sd-finetune
folder. - Dataset generation: Follow the instructions in the
generation
folder. - Train classification model with synthetic data: Follow the instructions in the
classification
folder.
If you use this code in your research, please kindly cite the following papers
@article{kim2025loft,
TBD
}
@article{kim2024datadream,
title={DataDream: Few-shot Guided Dataset Generation},
author={Kim, Jae Myung and Bader, Jessica and Alaniz, Stephan and Schmid, Cordelia and Akata, Zeynep},
journal={arXiv preprint arXiv:2407.10910},
year={2024}
}