Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

Shwai He, Tao Ge, Guoheng Sun, Bowei Tian, Xiaoyang Wang, Ang Li, Dong Yu

TL;DR

The open-source Mixture of Depths code and the official implementation of the paper "Router-Tuning: A Simple and Effective Approach for Enabling Dynamic Depth in Transformers."

Introduction

Traditional transformer models allocate a fixed amount of computational resources to every input token, leading to inefficient and unnecessary computation. To address this inefficiency, Mixture of Depths (MoD) was introduced, dynamically adjusting computational depth by skipping less important layers. While promising, current MoD approaches face two significant challenges:

High Training Costs: Existing methods require training the entire model alongside routers, which determine which layers to skip, resulting in substantial computational overhead.
Risk of Performance Degradation: Bypassing important layers can lead to a drop in model performance.

To overcome these challenges, we introduce Router-Tuning, a method that fine-tunes only the router on a small dataset, drastically reducing the training costs. Additionally, we propose Mindskip (Attention with Dynamic Depths), which preserves model performance while significantly enhancing computational and memory efficiency.

Our approach delivers competitive results, achieving up to 21% speedup with only a 0.2% performance drop, demonstrating its effectiveness in balancing efficiency and performance.

News

Oct 2024: Published preprint on arXiv along with the related codebase.

Quick Start

Installation

conda create -n router-tuning python=3.10
conda activate router-tuning

git clone https://github.com/CASE-Lab-UMD/Router-Tuning

cd ./Router-Tuning
pip install -r requirements.txt

Train

sh /scripts/finetune_mindskip.sh

Evaluation

The evaluation code is based on EleutherAI/lm-evaluation-harness. To fully reproduce our results, please use this version. It samples few-shot based on the index of the samples, avoiding the issue of result variation with the number of processes during data parallel inference.

Citation

@misc{he2024routertuningsimpleeffectiveapproach,
      title={Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers}, 
      author={Shwai He and Tao Ge and Guoheng Sun and Bowei Tian and Xiaoyang Wang and Ang Li and Dong Yu},
      year={2024},
      eprint={2410.13184},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.13184}, 
}

Contact Us

If you have any questions, please contact:

Shwai He: shwaihe@umd.edu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

TL;DR

Introduction

News

Quick Start

Installation

Train

Evaluation

Citation

Contact Us

Files

README.md

Latest commit

History

README.md

File metadata and controls

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

TL;DR

Introduction

News

Quick Start

Installation

Train

Evaluation

Citation

Contact Us