GitHub - tianyi-lab/R2-T2: Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Abstract

In large multimodal models (LMMs), the perception of non-language modalities (e.g., visual representations) is usually not on par with the large language models (LLMs)' powerful reasoning capabilities, deterring LMMs' performance on challenging downstream tasks. This weakness has been recently mitigated by replacing the vision encoder with a mixture-of-experts (MoE), which provides rich, multi-granularity, and diverse representations required by diverse downstream tasks. The performance of multimodal MoE largely depends on its router, which reweights and mixes the representations of different experts for each input. However, we find that the end-to-end trained router does not always produce the optimal routing weights for every test sample. To bridge the gap, we propose a novel and efficient method "Re-Routing in Test-Time(R2-T2)"that locally optimizes the vector of routing weights in test-time by moving it toward those vectors of the correctly predicted samples in a neighborhood of the test sample. We propose three R2-T2 strategies with different optimization objectives and neighbor-search spaces. R2-T2 consistently and significantly improves state-of-the-art LMMs' performance on challenging multimodal benchmarks of diverse tasks, without training any parameters in the base model.

Usage

You should first run the following lines.

conda create -n R2T2 python=3.9
conda activate R2T2
conda clean -a && pip cache purge
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r assets/requirements/requirements.txt
pip install -r assets/requirements/requirements_custom.txt
pip install flash-attn --no-build-isolation

You should make directory 'checkpoints' in moai/sgg and upload checkpoint of Scene Graph Generation after downloading it, where its checkpoint filename should be 'psgtr_r50_epoch_60.pth'

Download checkpoints with labeled name 'PSGTR' in Panoptic SGG.

At init_detector function in mmdet/apis/inference.py, line 95-110 should be commented to get compatibility.

# if palette != 'none':
#     model.dataset_meta['palette'] = palette
# else:
#     test_dataset_cfg = copy.deepcopy(config.test_dataloader.dataset)
#     # lazy init. We only need the metainfo.
#     test_dataset_cfg['lazy_init'] = True
#     metainfo = DATASETS.build(test_dataset_cfg).metainfo
#     cfg_palette = metainfo.get('palette', None)
#     if cfg_palette is not None:
#         model.dataset_meta['palette'] = cfg_palette
#     else:
#         if 'palette' not in model.dataset_meta:
#             warnings.warn(
#                 'palette does not exist, random is used by default. '
#                 'You can also set the palette to customize.')
#             model.dataset_meta['palette'] = 'random'

At inference_detector function in mmdet/apis/inference.py, line 179- should be changed by the following lines.

# build the data pipeline
data_ = test_pipeline(data_)

data_['inputs'] = data_['inputs'].unsqueeze(0)
data_['data_samples'] = [data_['data_samples']]

# forward the model
with torch.no_grad():
    results = model.test_step(data_)[0]

In mmcv/transforms/processing.py, line 388 should be commented to get compatibility.

# results['img_shape'] = padded_img.shape[:2]

Then download the benchmark and reference dataset

./download.sh

Run evaluate.py

python evaluate.py --reference reference.json --eval CV-Bench --num_neighbors 5 --num_steps 10 --initial_lr 0.01 --final_lr 1e-5

Reference Datasets and Benchmarks

Reference Datasets:

Benchmarks:

Acknowledgement: This code is based and developed on MoAI.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets/requirements		assets/requirements
configs		configs
moai		moai
modeling		modeling
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
R2D2.jpg		R2D2.jpg
README.md		README.md
download.sh		download.sh
evaluate.py		evaluate.py
radar.png		radar.png
reference.zip		reference.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Abstract

Usage

Reference Datasets and Benchmarks

Reference Datasets:

Benchmarks:

About

Releases

Packages

Languages

License

tianyi-lab/R2-T2

Folders and files

Latest commit

History

Repository files navigation

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Abstract

Usage

Reference Datasets and Benchmarks

Reference Datasets:

Benchmarks:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages