10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Accelerating Inference of Convolutional Neural Networks Using In-memory Computing

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In-memory computing (IMC) is a non-von Neumann paradigm that has recently established itself as a promising approach for energy-efficient, high throughput hardware for deep learning applications. One prominent application of IMC is that of performing matrix-vector multiplication in O ( 1 ) time complexity by mapping the synaptic weights of a neural-network layer to the devices of an IMC core. However, because of the significantly different pattern of execution compared to previous computational paradigms, IMC requires a rethinking of the architectural design choices made when designing deep-learning hardware. In this work, we focus on application-specific, IMC hardware for inference of Convolution Neural Networks (CNNs), and provide methodologies for implementing the various architectural components of the IMC core. Specifically, we present methods for mapping synaptic weights and activations on the memory structures and give evidence of the various trade-offs therein, such as the one between on-chip memory requirements and execution latency. Lastly, we show how to employ these methods to implement a pipelined dataflow that offers throughput and latency beyond state-of-the-art for image classification tasks.

          Related collections

          Most cited references37

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Deep Residual Learning for Image Recognition

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Attention Is All You Need

            The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 15 pages, 5 figures
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Densely Connected Convolutional Networks

                Bookmark

                Author and article information

                Contributors
                Journal
                Front Comput Neurosci
                Front Comput Neurosci
                Front. Comput. Neurosci.
                Frontiers in Computational Neuroscience
                Frontiers Media S.A.
                1662-5188
                03 August 2021
                2021
                : 15
                : 674154
                Affiliations
                [1] 1IBM Research Europe, Rüschlikon , Zurich, Switzerland
                [2] 2Eidgenössische Technische Hochschule Zürich , Zurich, Switzerland
                Author notes

                Edited by: Oliver Rhodes, The University of Manchester, United Kingdom

                Reviewed by: Shimeng Yu, Georgia Institute of Technology, United States; Rishad Shafik, Newcastle University, United Kingdom

                *Correspondence: Martino Dazzi daz@ 123456zurich.ibm.com
                Article
                10.3389/fncom.2021.674154
                8369825
                34413731
                15b776a0-a767-48a6-9911-a267ca6e066b
                Copyright © 2021 Dazzi, Sebastian, Benini and Eleftheriou.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 28 February 2021
                : 23 June 2021
                Page count
                Figures: 11, Tables: 6, Equations: 7, References: 38, Pages: 19, Words: 15273
                Categories
                Neuroscience
                Original Research

                Neurosciences
                convolutional neural network,in-memory computing,computational memory,ai hardware,neural network acceleration

                Comments

                Comment on this article