study_resources/daily.md at master · mileistone/study_resources · GitHub

2022-05-10

Dropout, drop channel, drop path, drop layer

[1904.03392]Efficient and Effective Dropout for Deep Convolutional Neural Networks
- knowledge distillation

2022-05-07

Efficient video classification

[2002.03342]Dynamic Inference: A New Approach Toward Efficient Video Action Recognition
- early stop from frames and depth perspective
[ICLR2021]AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition
- some like TSM
- Gumbel-Softmax Estimator
- https://mengyuest.github.io/AdaFuse/
- not suitable for GPU
[ICLR2021]VA-RED2: Video Adaptive Redundancy Reduction
- not suitable for GPU
- recognize redundant feature map in temporal or channels
- loss for redundant factor
[ICCV2021]MGSampler: An Explainable Sampling Strategy for Video Action Recognition
[ICCV2021]Adaptive Focus for Efficient Video Recognition
- reinforcement learning
- spatial and temporal
[CVPR2021]Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling
[BMVC2021]Conditional Model Selection for Efficient Video Understanding
[ECCV2020]AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
- https://mengyuest.github.io/AR-Net
[NIPS2020]Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification
- reinforcement learning
[CVPR2019]Efficient Video Classification Using Fewer Frames
[CVPR2019]AdaFrame: Adaptive Frame Selection for Fast Video Recognition
- reinforcement learning
[IJCAI2018]Watching a Small Portion could be as Good as Watching All: Towards Efficient Video Classification
- reinforcement learning

Conditional cumputation

[CVPR2018]Learning Strict Identity Mappings in Deep Residual Networks
[ECCV2018]Convolutional Networks with Adaptive Inference Graphs
- Gumbel sampling
- https://github.com/andreasveit/convnet-aig
[CVPR2017]Spatially Adaptive Computation Time for Residual Networks
- mask is determined by a probability predicted by an extra head
[NIPS2016]PerforatedCNNs: Acceleration through elimination of redundant convolutions
- mask is determined by gradient

2022-05-02

Model design

[2204.07143]Neighborhood Attention Transformer
[ICLR2022]MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
- Combine the strengths of CNNs and ViTs to build a light-weight and low latency network for mobile vision tasks
[NIPS2021]Revisiting ResNets: Improved Training and Scaling Strategies
- Training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models
- We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended
[CVPR2021]Rethinking Channel Dimensions for Efficient Model Design
[CVPR2020]Designing Network Design Spaces

2022-04-19

Stem design

[2201.09792]Patches Are All You Need?
[NIPS2021]Early Convolutions Help Transformers See Better
[CVPR2021]Fast and Accurate Model Scaling
[ICCV2021]Incorporating Convolution Designs into Visual Transformers
[ICCV2021]Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
[2105.02723]Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet
[ICCV2019 Workshop]Non-Discriminative Data or Weak Model? On the Relative Importance of Data and Model Resolution
[2105.03404]ResMLP: Feedforward networks for image classification with data-efficient training
awesome-vit

2022-04-08

Pinned Memory / Page-Locked Memory

2022-04-07

JPEG

2019-11-01

Detectron1-Comparisons

2019-08-19

Adversarial Examples Are Not Bugs, They Are Features

2019-08-06

A Survival Guide to a PhD

2019-08-01

2019-02-13

2019-01-16

Self-Driving Cars: A Survey

2019-01-15

2019-01-09

Transformer Implementation Details Not Described in The Paper

2019-01-08