Skip to content
/ TIDE Public

[CVPR 2025] A Unified Image-Dense Annotation Generation Model for Underwater Scenes

License

Notifications You must be signed in to change notification settings

HongkLin/TIDE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[CVPR 2025] A Unified Image-Dense Annotation Generation Model for Underwater Scenes

Website arXiv License

🌊 Introduction

We present TIDE, a unified underwater image-dense annotation generation model. Its core lies in the shared layout information and the natural complementarity between multimodal features. Our model, derived from the pre-trained text-to-image model and fine-tuned with underwater data, enables the generation of highly consistent underwater image-dense annotations from solely text conditions.

TIDE_demo.

🐚 News

  • 2025-3-28: The training and inference code is now available!
  • 2025-2-27: Our TIDE is accepted to CVPR 2025!

🪸 Dependencies and Installation

conda create -n TIDE python=3.9
conda activate TIDE
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

git clone https://github.com/HongkLin/TIDE
cd TIDE
pip install -r requirements.txt

🐬 Inference

Download the pre-trained PixArt-α, MiniTransformer, and TIDE checkpoint, then modify the model weights path.

python inference.py --model_weights_dir ./model_weights --text_prompt "A large school of fish swimming in a circle." --output ./outputs

🐢 Training

🏖️ ️Training Data Prepare

  • Download SUIM, UIIS, USIS10K datasets.
  • The semantic segmentation annotations are obtained by merging instances with the same semantics.
  • The depth annotations are obtained by Depth Anything V2, and the inverse depth results are saved as npy files.
  • The image caption and JSON file for organizing training data can follow Atlantis, which we also do.

The final dataset should be ordered as follow:

datasets/
    UWD_triplets/
        images/
            train_05543.jpg
            ...
        semseg_annotations/
            train_05543.jpg
            ...
        depth_annotations/
            train_05543_raw_depth_meter.npy
            ...
        TrainTIDE_Caption.json

If you have prepared the training data and environment, you can run the following script to start the training:

accelerate launch --num_processes=4 --main_process_port=36666 ./tide/train_tide_hf.py \
--max_train_steps 200000 --learning_rate=1e-4 --train_batch_size=1 \
--gradient_accumulation_steps=1 --seed=42 --dataloader_num_workers=4 --validation_steps 10000 \
--wandb_name=tide_r32_64_b4_200k --output_dir=./outputs/tide_r32_64_b4_200k

🤗Acknowledgements

  • Thanks to Diffusers for their wonderful technical support and awesome collaboration!
  • Thanks to Hugging Face for sponsoring the nicely demo!
  • Thanks to DiT for their wonderful work and codebase!
  • Thanks to PixArt-α for their wonderful work and codebase!

📖BibTeX

@inproceedings{lin2025tide,
      title={A Unified Image-Dense Annotation Generation Model for Underwater Scenes}, 
      author={Lin, Hongkai and Liang, Dingkang and Qi, Zhenghao and Bai, Xiang},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      year={2025},
}

About

[CVPR 2025] A Unified Image-Dense Annotation Generation Model for Underwater Scenes

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages