Interactive segmentation has gained significant attention due to its applications in human-computer interaction and data annotation. To address the challenge of target scale variations in interactive segmentation, we propose a novel multi-scale token fusion algorithm. This algorithm selectively fuses only the most important tokens, enabling the model to better capture multi-scale characteristics in important regions. To further enhance the robustness of multi-scale token selection, we introduce a token learning algorithm based on contrastive loss. This algorithm fully utilizes the discriminative information between target and background multi-scale tokens, effectively improving the quality of selected tokens. Extensive benchmark testing demonstrates the effectiveness of our approach in addressing multi-scale issues.
Training and evaluation environment: Python 3.9+, PyTorch > 1.0, CUDA. Run the following command to install required packages.
pip install -r requirements.txt
You need to configue the paths to the datasets in config.yml
before training or testing. If you have any questions, please feel free to open an Issue.
A script download_datasets.sh
is prepared to download and organize required datasets.
Dataset | Description | Download Link |
---|---|---|
MS COCO | 118k images with 1.2M instances (train) | official site |
LVIS v1.0 | 100k images with 1.2M instances (total) | official site |
COCO+LVIS* | 99k images with 1.5M instances (train) | original LVIS images + combined annotations |
SBD | 8498 images with 20172 instances for (train) 2857 images with 6671 instances for (test) |
official site |
Grab Cut | 50 images with one object each (test) | GrabCut.zip (11 MB) |
Berkeley | 96 images with 100 instances (test) | Berkeley.zip (7 MB) |
DAVIS | 345 images with one object each (test) | DAVIS.zip (43 MB) |
Pascal VOC | 1449 images with 3417 instances (test) | official site |
COCO_MVal | 800 images with 800 instances (test) | COCO_MVal.zip (127 MB) |
Multi-Scale LoveDA | 1666 images with 31,908 instances (test) | LoveDAMS.zip (1.4 GB) |
Online Demo.
An example script to run the demo.
python demo.py --checkpoint=weights/mst-3s-slim-448-base.pth --gpu 0
Before evaluation, please download the datasets and models, and then configure the path in config.yml
.
Download our model, please download below zipped files and extract before use:
Use the following code to evaluate the MST-3s
model.
python evaluate_model.py NoBRS\
--gpu=0\
--checkpoint=weights/mst-3s-slim-448-base.pth\
--datasets=DAVIS,LoveDASMALL,LoveDAMEDIUM,LoveDALARGE,LoveDAHUGE\
--cf-n=3\
--acf\
--n-clicks=20 \
--target-iou=0.9\
--inference_size=448
Before training, please download the MAE pretrained weights (click to download: ViT-Base, ViT-Large, ViT-Huge) and configure the dowloaded path in config.yml
Please also download the pretrained SimpleClick models from here.
Use the following code to train a huge model on C+L:
python dis_train.py models/adaptivevit_base448_cclvs.py\
--batch-size 60\
--model_base_name cclvs_adaptivevit_base448\
--exp-name 3s_mst\
--gpus 0,1,2,3,4,5\
--dport 29500\
--workers 2\
--img_size 448\
--amp
Res | Train Data | Arch | 3s | 6s | |||
---|---|---|---|---|---|---|---|
448*448 | COCO-LVIS | ViT-B | +CL | github | weight | - | - |
+MST | github | weight | - | - | |||
+MST+CL | github | weight | - | - | |||
224*224 | COCO-LVIS | ViT-B | +UpConv | github | weight | github | weight |
+AT | github | weight | github | weight | |||
+IT | github | weight | github | weight | |||
+MST | github | weight | github | weight | |||
+MST+CL | github | weight | github | weight | |||
ViT-L | +MST | github | weight | - | - | ||
+MST+CL | github | weight | - | - | |||
1024*1024 | COCO-LVIS+HQ | ViT-B | +MST-3s | github | weight | - | - |
Multi-Scale Datasets | Download Link |
@article{xu2024mst,
title={MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation},
author={Xu, Long and Li, Shanghong and Chen, Yongquan and Luo, Jun and Lai, Shiwu},
journal={arXiv preprint arXiv:2401.04403},
year={2024}
}
Our project is developed based on RITM and SimpleClick