6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automatic Identification of Closely-related Indian Languages: Resources and Experiments

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In this paper, we discuss an attempt to develop an automatic language identification system for 5 closely-related Indo-Aryan languages of India, Awadhi, Bhojpuri, Braj, Hindi and Magahi. We have compiled a comparable corpora of varying length for these languages from various resources. We discuss the method of creation of these corpora in detail. Using these corpora, a language identification system was developed, which currently gives state of the art accuracy of 96.48\%. We also used these corpora to study the similarity between the 5 languages at the lexical level, which is the first data-based study of the extent of closeness of these languages.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: not found
          • Article: not found

          Language identification from small text samples*

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Dialect Differences and Social Stratification in a North Indian Village

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Language Indentification: How to Distinguish Similar Languages?

                Bookmark

                Author and article information

                Journal
                26 March 2018
                Article
                1803.09405
                6364a81f-86f6-4cba-8f46-39a1ce8fdf7f

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Paper accepted at the 4th Workshop in Indian Languages Data and Resources (WILDRE - 4), 11th edition of the Language Resources and Evaluation Conference (LREC - 2018), 7-12 May 2018, Miyazaki (Japan)
                cs.CL

                Comments

                Comment on this article