v-CLR

[CVPR 2025 highlight] v-CLR: View-Consistent Learning for Open-World Instance Segmentation
Chang-Bin Zhang¹, Jinhong Ni¹, Yujie Zhong², Kai Han¹
¹ _{The University of Hong Kong}
² _{Meituan Inc.}

Updates

[04/25] We release the HuggingFace Demo.
[04/25] We release the code, datasets and weights of v-CLR.
[04/25] v-CLR is selected as a highlight by CVPR 2025.
[03/25] v-CLR is accepted by CVPR 2025.

Method

Model Zoo

Setting		Box AR₁₀	Box AR₁₀₀	Mask AR₁₀	Mask AR₁₀
VOC $\rightarrow$ NonVOC	Config & Weights	22.2	40.3	19.6	33.7
COCO $\rightarrow$ LVIS	Config & Weights	20.3	45.8	16.1	34.6

Environment Setup

This repository is based on the Detrex framework.
Python $\ge$ 3.7 and PyTorch $\ge$ 1.10 are required.
First, clone v-CLR repository.

git clone https://github.com/Visual-AI/vCLR.git
cd vCLR

Second, install detectron2 and detrex

pip install -e detectron2
pip install -r requirements.txt
pip install -e .

If you encounter any compilation error of cuda runtime, you may try to use

export CUDA_HOME=<your_cuda_path>

You may download images and annoations besides COCO from OWIS Datasets, and then organize the data as:

datasets/
├── coco2017/
│   │
│   ├── annotations/                  
│   │   ├── instances_train2017.json  
│   │   └── instances_val2017.json    
│   │
│   ├── train2017/                    
│   │   └── ...
│   │
│   └── val2017/                   
│       └── ...
│
├── object365_val/
│   └── ...
│
├── lvis_val/
│   └── ...
│
├── openworld_lvis_noncoco_val_correct.json
├── openworld_objects365_noncoco_val.json
├── uvo_nonvoc_val_rle.json
├── vCLR_coco_train2017_top5.json
├── vCLR_voc_train2017_top10.json
│
├── style_coco_train2017/
│   └── ...
│
├── train2017_depth_cmap/
│   └── ...
│
└── uvo_videos_dense_frames/
    └── ...

Train

Train on VOC classes:

python projects/train_net.py \
    --config-file projects/vCLR_deformable_mask/configs/dino-resnet/deformable_train_voc_eval_nonvoc.py \
    --num-gpus N \
    dataloader.train.total_batch_size=8 \
    train.output_dir=<output_dir> \
    model.num_queries=2000 \ # similar performance when more than 1000
    train.amp.enabled=True \ # mixed precision training
    model.transformer.encoder.use_checkpoint=True \ # gradient checkpointing, save gpu memory but lower speed
    train.init_checkpoint=detectron2/dino_RN50_pretrain_d2_format.pkl \ # NOTE training from scratch is better for baseline model

Train on COCO classes:

python projects/train_net.py \
    --config-file projects/vCLR_deformable_mask/configs/dino-resnet/deformable_train_coco_eval_lvis.py \
    --num-gpus N \
    dataloader.train.total_batch_size=8 \
    train.output_dir=<output_dir> \
    model.num_queries=2000 \ # similar performance when more than 1000
    train.amp.enabled=True \ # mixed precision training
    model.transformer.encoder.use_checkpoint=True \ # gradient checkpointing, save gpu memory but lower speed
    train.init_checkpoint=detectron2/dino_RN50_pretrain_d2_format.pkl \ # NOTE training from scratch is better for baseline model

Evaluate

Evaluate on Non-VOC or LVIS:

python projects/train_net.py \
    --config-file <config_file> \
    --eval-only \
    --num-gpus=4 \
    train.init_checkpoint=<checkpoint_path> \
    train.model_ema.use_ema_weights_for_eval_only=True \
    dataloader.test.dataset.names="openworld_nonvoc_classes_val2017" or "openworld_LVIS_noncoco_val2017" \

Evaluate on UVO:

python projects/train_net.py \
    --config-file <config_file> \
    --eval-only \
    --num-gpus=4 \
    train.init_checkpoint=<checkpoint_path> \
    train.model_ema.use_ema_weights_for_eval_only=True \
    dataloader.test.dataset.names="openworld_uvo_nonvoc_val2017" \
    dataloader.train.mapper.instance_mask_format='bitmask' \

Evaluate on Objects365:

python projects/train_net.py \
    --config-file <config_file> \
    --eval-only \
    --num-gpus=4 \
    train.init_checkpoint=<checkpoint_path> \
    train.model_ema.use_ema_weights_for_eval_only=True \
    dataloader.test.dataset.names="openworld_objects365_noncoco_val2017" \
    dataloader.train.mapper.mask_on=False \

Citation

@inproceedings{zhang2024vclr,
  title={v-CLR: View-Consistent Learning for Open-World Instance Segmentation},
  author={Zhang, Chang-Bin and Ni, Jinhong and Zhong, Yujie and Han, Kai},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
configs		configs
demo		demo
detectron2		detectron2
detrex		detrex
dev		dev
docs		docs
projects/vCLR_deformable_mask		projects/vCLR_deformable_mask
tests		tests
tools		tools
.clang-format		.clang-format
.flake8		.flake8
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
convert_pth.py		convert_pth.py
gen_depth.py		gen_depth.py
gen_depthmap.ipynb		gen_depthmap.ipynb
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

v-CLR

Updates

Method

Model Zoo

Environment Setup

Train

Evaluate

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Visual-AI/v-CLR

Folders and files

Latest commit

History

Repository files navigation

v-CLR

Updates

Method

Model Zoo

Environment Setup

Train

Evaluate

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages