We present TIDE, a unified underwater image-dense annotation generation model. Its core lies in the shared layout information and the natural complementarity between multimodal features. Our model, derived from the pre-trained text-to-image model and fine-tuned with underwater data, enables the generation of highly consistent underwater image-dense annotations from solely text conditions.
- 2025-3-28: The training and inference code is now available!
- 2025-2-27: Our TIDE is accepted to CVPR 2025!
- Python >= 3.9 (Recommend to use Anaconda or Miniconda)
- PyTorch >= 2.0.1+cu11.7
conda create -n TIDE python=3.9
conda activate TIDE
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
git clone https://github.com/HongkLin/TIDE
cd TIDE
pip install -r requirements.txt
Download the pre-trained PixArt-α, MiniTransformer, and TIDE checkpoint, then modify the model weights path.
python inference.py --model_weights_dir ./model_weights --text_prompt "A large school of fish swimming in a circle." --output ./outputs
- Download SUIM, UIIS, USIS10K datasets.
- The semantic segmentation annotations are obtained by merging instances with the same semantics.
- The depth annotations are obtained by Depth Anything V2, and the inverse depth results are saved as npy files.
- The image caption and JSON file for organizing training data can follow Atlantis, which we also do.
The final dataset should be ordered as follow:
datasets/
UWD_triplets/
images/
train_05543.jpg
...
semseg_annotations/
train_05543.jpg
...
depth_annotations/
train_05543_raw_depth_meter.npy
...
TrainTIDE_Caption.json
If you have prepared the training data and environment, you can run the following script to start the training:
accelerate launch --num_processes=4 --main_process_port=36666 ./tide/train_tide_hf.py \
--max_train_steps 200000 --learning_rate=1e-4 --train_batch_size=1 \
--gradient_accumulation_steps=1 --seed=42 --dataloader_num_workers=4 --validation_steps 10000 \
--wandb_name=tide_r32_64_b4_200k --output_dir=./outputs/tide_r32_64_b4_200k
- Thanks to Diffusers for their wonderful technical support and awesome collaboration!
- Thanks to Hugging Face for sponsoring the nicely demo!
- Thanks to DiT for their wonderful work and codebase!
- Thanks to PixArt-α for their wonderful work and codebase!
@inproceedings{lin2025tide,
title={A Unified Image-Dense Annotation Generation Model for Underwater Scenes},
author={Lin, Hongkai and Liang, Dingkang and Qi, Zhenghao and Bai, Xiang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025},
}