4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective

      Preprint
      , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Count-based word alignment methods, such as the IBM models or fast-align, struggle on very small parallel corpora. We therefore present an alternative approach based on cross-lingual word embeddings (CLWEs), which are trained on purely monolingual data. Our main contribution is an unsupervised objective to adapt CLWEs to parallel corpora. In experiments on between 25 and 500 sentences, our method outperforms fast-align. We also show that our fine-tuning objective consistently improves a CLWE-only baseline.

          Related collections

          Author and article information

          Journal
          31 October 2018
          Article
          1811.00066
          a5bde62a-83f3-4f11-82b2-9ba16e93a903

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          cs.CL

          Theoretical computer science
          Theoretical computer science

          Comments

          Comment on this article