Skip to content

All about acceleration and compression of Deep Neural Networks

Notifications You must be signed in to change notification settings

XinDongol/DNNAC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DNNAC

All about acceleration and compression of Deep Neural Networks


Quantization

General

  • XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

    A classic paper for binary neural network saying all weights and activation are binarized.

    Implementation: MXNet, Pytorch, Torch (origin)

  • DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

    Full stack quantization for weights, activation and gradient.

    Implementation: Tensorpack

  • Deep Learning with Low Precision by Half-wave Gaussian Quantization

    Try to improve expersiveness of quantized activation function.

    Implementation: Caffe (origin)

  • Quantizing deep convolutional networks for efficient inference: A whitepaper

    Non-official technical report of quantization from Google. You can find a lot of technical details about quantization in this paper.

  • Data-Free Quantization through Weight Equalization and Bias Correction

    Implementation: Pytorch

  • Additive Noise Annealing and Approximation Properties of Quantized Neural Networks

    Implementation: Pytorch

  • Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks

    find optimal bit-width with NAS

    Implementation: Pytorch

  • Progressive Stochastic Binarization of Deep Networks

    Use power-of-2

    Implementation: TF

  • Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks

    how to find the optimal threshold

    Implementation: TF

  • FAT: Fast Adjustable Threshold for Uniform Neural Network Quantization (Winning Solution on LPIRC-II)

    Implementation: TF

  • Proximal Mean-field for Neural Network Quantization

    Implementation: Pytorch

  • A Survey on Methods and Theories of Quantized Neural Networks

    Nice survey on quantization (up to Dec. 2018)

Binary
  • Balanced Binary Neural Networks with Gated Residual
  • IR-Net: Forward and Backward Information Retention for Highly Accurate Binary Neural Networks

Application-oriented

NLP
  • Differentiable Product Quantization for Embedding Compression

    compress the embedding table with end-to-end learned KD codes via differentiable product quantization (DPQ)

    Implementation: TF

Adversarial
  • Model Compression with Adversarial Robustness: A Unified Optimization Framework

    This paper studies model compression through a different lens: could we compress models without hurting their robustness to adversarial attacks, in addition to maintaining accuracy?

    Implementation: Pytorch

Pruning

  • Learning both Weights and Connections for Efficient Neural Networks

    A very simple way to introduce arbitrary sparisity.

  • Learning Structured Sparsity in Deep Neural Networks

    An united way to introduce structured sparsity.

    Implementation: Caffe

Neural Architecture Search (NAS)

  • Resource
    1. automl.org
  • Partial Channel Connections for Memory-Efficient Differentiable Architecture Search

    Our approach is memory efficient:(i) batch-size is increased to further accelerate the search on CIFAR10, (ii) directly search on ImageNet. Searched on ImageNet, we achieved currently one of, if not only, the best performance on ImageNet (24.2%/7.3%) under the mobile setting! The search process in CIFAR10 only requires 0.1 GPU-days, i.e., ~3 hours on one Nvidia 1080ti.(1.5 hours on one Tesla V100) Implementation: PyTorch (origin)

Others

  • Benchmark Analysis of Representative Deep Neural Network Architectures [IEEE Access, University of Milano-Bicocca]

    This work presents an in-depth analysis of the majority of the deep neural networks (DNNs) proposed in the state of the art for image recognition in terms of GFLOPs, #weights, Top-1 accuacy and so on.

  • Net2Net : Accelerating Learning via Knowledge Transfer

    An interesting way to change the architecture of models while keeping output the same

    Implementation: TF, Pytorch

Embedded System

  • EMDL: Embedded and mobile deep learning research notes

    Embedded and mobile deep learning research notes on Github

Tools

Research

  • slimmable_networks

    An open source framework for slimmable training on tasks of ImageNet classification and COCO detection, which has enabled numerous projects.

  • distiller

    a Python package for neural network compression research

  • QPyTorch

    QPyTorch is a low-precision arithmetic simulation package in PyTorch. It is designed to support researches on low-precision machine learning, especially for researches in low-precision training.

  • Graffitist

    Graffitist is a flexible and scalable framework built on top of TensorFlow to process low-level graph descriptions of deep neural networks (DNNs) for accurate and efficient inference on fixed-point hardware. It comprises of a (growing) library of transforms to apply various neural network compression techniques such as quantization, pruning, and compression. Each transform consists of unique pattern matching and manipulation algorithms that when run sequentially produce an optimized output graph.

Industry

  • dabnn

    dabnn is an accelerated binary neural networks inference framework for mobile platform

About

All about acceleration and compression of Deep Neural Networks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published