Skip to content

Official Implementation (Pytorch) of "EfficientViM: Efficient Vision Mamba with Hidden State Mixer-based State Space Duality"

License

Notifications You must be signed in to change notification settings

mlvlab/EfficientViM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EfficientViM

EfficientViM: Efficient Vision Mamba with Hidden State Mixer-based State Space Duality

Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim*


This repository is an official implementation of EfficientViM: Efficient Vision Mamba with Hidden State Mixer-based State Space Duality.

TODO

  • Release the code for dense predictions.

Main Results

Comparison of efficient networks on ImageNet-1K classification.

The family of EfficientViM, marked as red and blue stars, shows the best speed-accuracy trade-offs. ($✝$: with distillation)

Image classification on ImageNet-1K (pretrained models)

model resolution epochs acc #params FLOPs checkpoint
EfficientViM-M1 224x224 300 72.9 6.7M 239M EfficientViM_M1_e300.pth
EfficientViM-M1 224x224 450 73.5 6.7M 239M EfficientViM_M1_e450.pth
EfficientViM-M2 224x224 300 75.4 13.9M 355M EfficientViM_M2_e300.pth
EfficientViM-M2 224x224 450 75.8 13.9M 355M EfficientViM_M2_e450.pth
EfficientViM-M3 224x224 300 77.6 16.6M 656M EfficientViM_M3_e300.pth
EfficientViM-M3 224x224 450 77.9 16.6M 656M EfficientViM_M3_e450.pth
EfficientViM-M4 256x256 300 79.4 19.6M 1111M EfficientViM_M4_e300.pth
EfficientViM-M4 256x256 450 79.6 19.6M 1111M EfficientViM_M4_e450.pth

Image classification on ImageNet-1K with distillation

model resolution epochs acc checkpoint
EfficientViM-M1 224x224 300 74.6 EfficientViM_M1_dist.pth
EfficientViM-M2 224x224 300 76.7 EfficientViM_M2_dist.pth
EfficientViM-M3 224x224 300 79.1 EfficientViM_M3_dist.pth
EfficientViM-M4 256x256 300 80.7 EfficientViM_M4_dist.pth

Getting Started

Installation

# Clone this repository:
git clone https://github.com/mlvlab/EfficientViM.git
cd EfficientViM

# Create and activate the environment
conda create -n EfficientViM python==3.10
conda activate EfficientViM

# Install dependencies
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt

Training

To train EfficientViM for classification on ImageNet, run train.sh in classification:

cd classification
sh train.sh <num-gpus> <batch-size-per-gpu> <epochs> <model-name> <imagenet-path> <output-path>

For example, to train EfficientViM-M1 for 450 epochs using 8 GPU (with a total batch size of 2048 calculated as <num-gpus> $\times$ <batch-size-per-gpu>), run:

sh train.sh 8 256 450 EfficientViM_M1 <imagenet-path> <output-path>

Training with distillation

To train EfficientViM with distillation objective of DeiT, run train_dist.sh in classification:

sh train_dist.sh <num-gpus> <batch-size-per-gpu> <model-name> <imagenet-path> <output-path>

Evaluation

To evaluate a pre-trained EfficientViM, run test.sh in classification:

sh test.sh <num-gpus> <model-name> <imagenet-path> <checkpoint-path>
# For evaluation with the model trained with distillation
# sh test_dist.sh <num-gpus> <model-name> <imagenet-path> <checkpoint-path>

Acknowledgements

This repo is built upon Swin, VSSD, SHViT, EfficientViT, and SwiftFormer.
Thanks to the authors for their inspiring works!

Citation

If this work is helpful for your research, please consider citing it.

@article{EfficientViM,
  title={EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality},
  author={Lee, Sanghyeok and Choi, Joonmyung and Kim, Hyunwoo J.},
  journal={arXiv preprint arXiv:2411.15241},
  year={2024}
}

About

Official Implementation (Pytorch) of "EfficientViM: Efficient Vision Mamba with Hidden State Mixer-based State Space Duality"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published