Accelerating Inference of Convolutional Neural Networks Using In-memory Computing

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In-memory computing (IMC) is a non-von Neumann paradigm that has recently established itself as a promising approach for energy-efficient, high throughput hardware for deep learning applications. One prominent application of IMC is that of performing matrix-vector multiplication in $O (1)$ time complexity by mapping the synaptic weights of a neural-network layer to the devices of an IMC core. However, because of the significantly different pattern of execution compared to previous computational paradigms, IMC requires a rethinking of the architectural design choices made when designing deep-learning hardware. In this work, we focus on application-specific, IMC hardware for inference of Convolution Neural Networks (CNNs), and provide methodologies for implementing the various architectural components of the IMC core. Specifically, we present methods for mapping synaptic weights and activations on the memory structures and give evidence of the various trade-offs therein, such as the one between on-chip memory requirements and execution latency. Lastly, we show how to employ these methods to implement a pipelined dataflow that offers throughput and latency beyond state-of-the-art for image classification tasks.

Related collections

Most cited references 37

Record: found
Abstract: not found
Conference Proceedings: not found

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren … (2019)

0 comments Cited 7891 times – based on 0 reviews

Bookmark

Record: found
Abstract: found
Article: not found

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar … (2017)

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 15 pages, 5 figures

0 comments Cited 2285 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Densely Connected Convolutional Networks

Kilian Q. Weinberger, Gao Huang, Zhuang Liu … (2018)

0 comments Cited 2027 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Martino Dazzi: URI : http://loop.frontiersin.org/people/1166861/overview

Abu Sebastian: URI : http://loop.frontiersin.org/people/296201/overview

Evangelos Eleftheriou: URI : http://loop.frontiersin.org/people/977276/overview

Journal

Journal ID (nlm-ta): Front Comput Neurosci

Journal ID (iso-abbrev): Front Comput Neurosci

Journal ID (publisher-id): Front. Comput. Neurosci.

Title: Frontiers in Computational Neuroscience

Publisher: Frontiers Media S.A.

ISSN (Electronic): 1662-5188

Publication date (Electronic): 03 August 2021

Publication date Collection: 2021

Volume: 15

Electronic Location Identifier: 674154

Affiliations

[1] ¹IBM Research Europe, Rüschlikon , Zurich, Switzerland

[2] ²Eidgenössische Technische Hochschule Zürich , Zurich, Switzerland

Author notes

Edited by: Oliver Rhodes, The University of Manchester, United Kingdom

Reviewed by: Shimeng Yu, Georgia Institute of Technology, United States; Rishad Shafik, Newcastle University, United Kingdom

*Correspondence: Martino Dazzi daz@ 123456zurich.ibm.com

Article

DOI: 10.3389/fncom.2021.674154

PMC ID: 8369825

PubMed ID: 34413731

SO-VID: 15b776a0-a767-48a6-9911-a267ca6e066b

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

History

Date received : 28 February 2021

Date accepted : 23 June 2021

Page count

Figures: 11, Tables: 6, Equations: 7, References: 38, Pages: 19, Words: 15273

Comments

Comment on this article

scite_

Cited by 2

See all cited by

Most referenced authors 1,207

See all reference authors

Accelerating Inference of Convolutional Neural Networks Using In-memory Computing

Read this article at

Abstract

Related collections

NeuroImaging Methods

Most cited references 37

Deep Residual Learning for Image Recognition

Attention Is All You Need

Densely Connected Convolutional Networks

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 99

Cited by 2

Most referenced authors 1,207