7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings

      1 , 1
      Complexity
      Hindawi Limited

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Automatic synonym extraction plays an important role in many natural language processing systems, such as those involving information retrieval and question answering. Recently, research has focused on extracting semantic relations from word embeddings since they capture relatedness and similarity between words. However, using word embeddings alone poses problems for synonym extraction because it cannot determine whether the relation between words is synonymy or some other semantic relation. In this paper, we present a novel solution for this problem by proposing the SynoExtractor pipeline, which can be used to filter similar word embeddings to retain synonyms based on specified linguistic rules. Our experiments were conducted using KSUCCA and Gigaword embeddings and trained with CBOW and SG models. We evaluated automatically extracted synonyms by comparing them with Alma’any Arabic synonym thesauri. We also arranged for a manual evaluation by two Arabic linguists. The results of experiments we conducted show that using the SynoExtractor pipeline enhances the precision of synonym extraction compared to using the cosine similarity measure alone. SynoExtractor obtained a 0.605 mean average precision (MAP) for the King Saud University Corpus of Classical Arabic with 21% improvement over the baseline and a 0.748 MAP for the Gigaword corpus with 25% improvement. SynoExtractor outperformed the Sketch Engine thesaurus for synonym extraction by 32% in terms of MAP. Our work shows promising results for synonym extraction suggesting that our method can also be used with other languages.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: not found
          • Article: not found

          The distributional hypothesis

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Normalized (pointwise) mutual information in collocation extraction

              Bookmark
              • Record: found
              • Abstract: not found
              • Book: not found

              Semantic Relations and the Lexicon

                Bookmark

                Author and article information

                Contributors
                (View ORCID Profile)
                (View ORCID Profile)
                Journal
                Complexity
                Complexity
                Hindawi Limited
                1099-0526
                1076-2787
                February 16 2021
                February 16 2021
                : 2021
                : 1-13
                Affiliations
                [1 ]Department of Information Technology, College of Computer and Information Sciences, King Saud University, P.O. Box 12371, Riyadh, Saudi Arabia
                Article
                10.1155/2021/6627434
                7885d796-2a7e-43d3-87fc-a202d8a4b531
                © 2021

                https://creativecommons.org/licenses/by/4.0/

                History

                Comments

                Comment on this article