Training wide residual networks for deployment using a single bit for
  each weight

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

For fast and energy-efficient deployment of trained deep neural networks on resource-constrained embedded hardware, each learned weight parameter should ideally be represented and stored using a single bit. Error-rates usually increase when this requirement is imposed. Here, we report large improvements in error rates on multiple datasets, for deep convolutional neural networks deployed with 1-bit-per-weight. Using wide residual networks as our main baseline, our approach simplifies existing methods that binarize weights by applying the sign function in training; we apply scaling factors for each layer with constant unlearned values equal to the layer-specific standard deviations used for initialization. For CIFAR-10, CIFAR-100 and ImageNet, and models with 1-bit-per-weight requiring less than 10 MB of parameter memory, we achieve error rates of 3.9%, 18.5% and 26.0% / 8.5% (Top-1 / Top-5) respectively. We also considered MNIST, SVHN and ImageNet32, achieving 1-bit-per-weight test results of 0.27%, 1.9%, and 41.3% / 19.1% respectively. For CIFAR, our error rates halve previously reported values, and are within about 1% of our error-rates for the same network with full-precision weights. For networks that overfit, we also show significant improvements in error rate by not learning batch normalization scale and offset parameters. This applies to both full precision and 1-bit-per-weight networks. Using a warm-restart learning-rate schedule, we found that training for 1-bit-per-weight is just as fast as full-precision networks, with better accuracy than standard schedules, and achieved about 98%-99% of peak performance in just 62 training epochs for CIFAR-10/100. For full training code and trained models in MATLAB, Keras and PyTorch see https://github.com/McDonnell-Lab/1-bit-per-weight/ .

Related collections

Author and article information

Journal

Publication date Created: 23 February 2018

Article

ArXiV ID: 1802.08530

SO-VID: 09dfe79a-de92-4c7e-b56c-21135d09e9a7

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Journal reference ICLR 2018 - International Conference on Learning Representations, Apr 2018, Vancouver, Canada. 2018

Training wide residual networks for deployment using a single bit for each weight

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 88

Most referenced authors 25