- [1904.03392]Efficient and Effective Dropout for Deep Convolutional Neural Networks
- knowledge distillation
- [2002.03342]Dynamic Inference: A New Approach Toward Efficient Video Action Recognition
- early stop from frames and depth perspective
- [ICLR2021]AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition
- some like TSM
- Gumbel-Softmax Estimator
- https://mengyuest.github.io/AdaFuse/
- not suitable for GPU
- [ICLR2021]VA-RED2: Video Adaptive Redundancy Reduction
- not suitable for GPU
- recognize redundant feature map in temporal or channels
- loss for redundant factor
- [ICCV2021]MGSampler: An Explainable Sampling Strategy for Video Action Recognition
- [ICCV2021]Adaptive Focus for Efficient Video Recognition
- reinforcement learning
- spatial and temporal
- [CVPR2021]Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling
- [BMVC2021]Conditional Model Selection for Efficient Video Understanding
- [ECCV2020]AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
- [NIPS2020]Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification
- reinforcement learning
- [CVPR2019]Efficient Video Classification Using Fewer Frames
- [CVPR2019]AdaFrame: Adaptive Frame Selection for Fast Video Recognition
- reinforcement learning
- [IJCAI2018]Watching a Small Portion could be as Good as Watching All: Towards Efficient Video Classification
- reinforcement learning
- [CVPR2018]Learning Strict Identity Mappings in Deep Residual Networks
- [ECCV2018]Convolutional Networks with Adaptive Inference Graphs
- Gumbel sampling
- https://github.com/andreasveit/convnet-aig
- [CVPR2017]Spatially Adaptive Computation Time for Residual Networks
- mask is determined by a probability predicted by an extra head
- [NIPS2016]PerforatedCNNs: Acceleration through elimination of redundant convolutions
- mask is determined by gradient
- [2204.07143]Neighborhood Attention Transformer
- [ICLR2022]MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
- Combine the strengths of CNNs and ViTs to build a light-weight and low latency network for mobile vision tasks
- [NIPS2021]Revisiting ResNets: Improved Training and Scaling Strategies
- Training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models
- We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended
- [CVPR2021]Rethinking Channel Dimensions for Efficient Model Design
- [CVPR2020]Designing Network Design Spaces
- [2201.09792]Patches Are All You Need?
- [NIPS2021]Early Convolutions Help Transformers See Better
- [CVPR2021]Fast and Accurate Model Scaling
- [ICCV2021]Incorporating Convolution Designs into Visual Transformers
- [ICCV2021]Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
- [2105.02723]Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet
- [ICCV2019 Workshop]Non-Discriminative Data or Weak Model? On the Relative Importance of Data and Model Resolution
- [2105.03404]ResMLP: Feedforward networks for image classification with data-efficient training
- awesome-vit
- PyTorch: How does pin_memory work in Dataloader
- How to Optimize Data Transfers in CUDA C/C++
- Page-Locked Host Memory for Data Transfer
- Image-Compression-using-MATLAB-Project-Report
- JEPG encoding
- JPEG Image Compression Systems
- JPEG: Image compression algorithm
- PIL convert('ycbcr') gives different result from formula