81
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Attention Is All You Need

      journal-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

          Abstract

          15 pages, 5 figures

          Related collections

          Author and article information

          Journal
          arXiv
          2017
          12 June 2017
          13 June 2017
          19 June 2017
          20 June 2017
          20 June 2017
          21 June 2017
          30 June 2017
          03 July 2017
          06 December 2017
          07 December 2017
          June 2017
          Article
          10.48550/ARXIV.1706.03762
          35895330
          3f4233f3-765b-4222-bba8-e00a7c457ede

          arXiv.org perpetual, non-exclusive license

          History

          Computation and Language (cs.CL),Machine Learning (cs.LG),FOS: Computer and information sciences

          Comments

          Comment on this article