22
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Automatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature representations from the heterogeneous acoustic feature groups which may contain redundant and unrelated information leading to low emotion recognition performance in this work. After obtaining the informative features, a fusion network is trained to jointly learn the discriminative acoustic feature representation and a Support Vector Machine (SVM) is used as the final classifier for recognition task. Experimental results on the IEMOCAP dataset demonstrate that the proposed architecture improved the recognition performance, achieving accuracy of 64% compared to existing state-of-the-art approaches.

          Related collections

          Most cited references49

          • Record: found
          • Abstract: not found
          • Article: not found

          IEMOCAP: interactive emotional dyadic motion capture database

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Extraction of visual features for lipreading

                Bookmark

                Author and article information

                Journal
                Sensors (Basel)
                Sensors (Basel)
                sensors
                Sensors (Basel, Switzerland)
                MDPI
                1424-8220
                18 June 2019
                June 2019
                : 19
                : 12
                : 2730
                Affiliations
                [1 ]College of Intelligence and Computing, Tianjin University, Tianjin 300072, China; jiangweitju@ 123456163.com (W.J.); jinsheng@ 123456tju.edu.cn (J.S.J.); hanxianf@ 123456163.com (X.H.)
                [2 ]School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China; licg@ 123456czu.cn
                Author notes
                [* ]Correspondence: wzheng@ 123456tju.edu.cn ; Tel.: +86-186-2201-2862
                [†]

                These authors contributed equally to this work.

                Article
                sensors-19-02730
                10.3390/s19122730
                6630663
                31216650
                c1a270e5-7c8c-4787-8d9d-9fd28b317fe5
                © 2019 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 16 April 2019
                : 16 June 2019
                Categories
                Article

                Biomedical engineering
                human–computer interaction (hci),speech emotion recognition,deep neural architecture,heterogeneous feature unification,fusion network

                Comments

                Comment on this article