16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Matching Natural Language Sentences with Hierarchical Sentence Factorization

      Preprint
      , , , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Semantic matching of natural language sentences or identifying the relationship between two sentences is a core research problem underlying many natural language tasks. Depending on whether training data is available, prior research has proposed both unsupervised distance-based schemes and supervised deep learning schemes for sentence matching. However, previous approaches either omit or fail to fully utilize the ordered, hierarchical, and flexible structures of language objects, as well as the interactions between them. In this paper, we propose Hierarchical Sentence Factorization---a technique to factorize a sentence into a hierarchical representation, with the components at each different scale reordered into a "predicate-argument" form. The proposed sentence factorization technique leads to the invention of: 1) a new unsupervised distance metric which calculates the semantic distance between a pair of text snippets by solving a penalized optimal transport problem while preserving the logical relationship of words in the reordered sentences, and 2) new multi-scale deep learning models for supervised semantic training, based on factorized sentence hierarchies. We apply our techniques to text-pair similarity estimation and text-pair relationship classification tasks, based on multiple datasets such as STSbenchmark, the Microsoft Research paraphrase identification (MSRP) dataset, the SICK dataset, etc. Extensive experiments show that the proposed hierarchical sentence factorization can be used to significantly improve the performance of existing unsupervised distance-based metrics as well as multiple supervised deep learning models based on the convolutional neural network (CNN) and long short-term memory (LSTM).

          Related collections

          Most cited references6

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          The Berkeley FrameNet Project

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Interpreting TF-IDF term weights as making relevance decisions

              Bookmark
              • Record: found
              • Abstract: not found
              • Book Chapter: not found

              Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval

                Bookmark

                Author and article information

                Journal
                28 February 2018
                Article
                10.1145/3178876.3186022
                1803.00179
                4f2cc076-a0a0-4efc-9f45-36a867bc11ca

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Accepted by WWW 2018, 10 pages
                cs.CL

                Comments

                Comment on this article