6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Learning To Split and Rephrase From Wikipedia Edit History

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning. We extract a rich new dataset for this task by mining Wikipedia's edit history: WikiSplit contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan et al. (2017) as a benchmark for this task. Incorporating WikiSplit as training data produces a model with qualitatively better predictions that score 32 BLEU points above the prior best result on the WebSplit benchmark.

          Related collections

          Most cited references2

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Six Challenges for Neural Machine Translation

          We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search. We show both deficiencies and improvements over the quality of phrase-based statistical machine translation.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Syntactic Simplification and Text Cohesion

              Bookmark

              Author and article information

              Journal
              28 August 2018
              Article
              1808.09468
              00263808-28c0-4494-a422-59d2da641b04

              http://arxiv.org/licenses/nonexclusive-distrib/1.0/

              History
              Custom metadata
              Proc. of EMNLP 2018
              cs.CL

              Theoretical computer science
              Theoretical computer science

              Comments

              Comment on this article