1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Multilingual Coreference Resolution in Multiparty Dialogue

      Preprint
      , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Existing multiparty dialogue datasets for coreference resolution are nascent, and many challenges are still unaddressed. We create a large-scale dataset, Multilingual Multiparty Coref (MMC), for this task based on TV transcripts. Due to the availability of gold-quality subtitles in multiple languages, we propose reusing the annotations to create silver coreference data in other languages (Chinese and Farsi) via annotation projection. On the gold (English) data, off-the-shelf models perform relatively poorly on MMC, suggesting that MMC has broader coverage of multiparty coreference than prior datasets. On the silver data, we find success both using it for data augmentation and training from scratch, which effectively simulates the zero-shot cross-lingual setting.

          Related collections

          Author and article information

          Journal
          02 August 2022
          Article
          2208.01307
          e914ac97-0a69-43e5-ac8f-d634dc5235b1

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          cs.CL cs.AI

          Theoretical computer science,Artificial intelligence
          Theoretical computer science, Artificial intelligence

          Comments

          Comment on this article