7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Scalable Deep Learning on Distributed Infrastructures : Challenges, Techniques, and Tools

      1 , 1
      ACM Computing Surveys
      Association for Computing Machinery (ACM)

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-art results in various domains, such as image recognition and natural language processing. One of the reasons for this success is the increasing size of DL models and the proliferation of vast amounts of training data being available. To keep on improving the performance of DL, increasing the scalability of DL systems is necessary. In this survey, we perform a broad and thorough investigation on challenges, techniques and tools for scalable DL on distributed infrastructures. This incorporates infrastructures for DL, methods for parallel DL training, multi-tenant resource scheduling, and the management of training and model data. Further, we analyze and compare 11 current open-source DL frameworks and tools and investigate which of the techniques are commonly implemented in practice. Finally, we highlight future research trends in DL systems that deserve further research.

          Related collections

          Most cited references81

          • Record: found
          • Abstract: not found
          • Article: not found

          A Comprehensive Survey of Deep Learning for Image Captioning

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Random search for hyper-parameter optimization

              Bookmark
              • Record: found
              • Abstract: not found
              • Book: not found

              Proceedings of the 12th USENIX symposium on operating systems design and implementation (OSDI 16)

                Bookmark

                Author and article information

                Journal
                ACM Computing Surveys
                ACM Comput. Surv.
                Association for Computing Machinery (ACM)
                0360-0300
                1557-7341
                May 29 2020
                May 29 2020
                : 53
                : 1
                : 1-37
                Affiliations
                [1 ]Technical University of Munich, Boltzmannstrasse, Garching, Germany
                Article
                10.1145/3363554
                de3f9dec-30ba-415d-99b0-4bf059c413a7
                © 2020
                History

                Comments

                Comment on this article