Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Automatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature representations from the heterogeneous acoustic feature groups which may contain redundant and unrelated information leading to low emotion recognition performance in this work. After obtaining the informative features, a fusion network is trained to jointly learn the discriminative acoustic feature representation and a Support Vector Machine (SVM) is used as the final classifier for recognition task. Experimental results on the IEMOCAP dataset demonstrate that the proposed architecture improved the recognition performance, achieving accuracy of 64% compared to existing state-of-the-art approaches.

Related collections

Most cited references 49

Record: found
Abstract: not found
Article: not found

IEMOCAP: interactive emotional dyadic motion capture database

Sungbok Lee, Samuel D Kim, Emily Mower … (2008)

0 comments Cited 254 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Björn Schuller, Anton Batliner, Dino Seppi … (2011)

0 comments Cited 59 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Extraction of visual features for lipreading

I Matthews, R. Harvey, S. Cox … (2002)

0 comments Cited 49 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Sensors (Basel)

Journal ID (iso-abbrev): Sensors (Basel)

Journal ID (publisher-id): sensors

Title: Sensors (Basel, Switzerland)

Publisher: MDPI

ISSN (Electronic): 1424-8220

Publication date (Electronic): 18 June 2019

Publication date Collection: June 2019

Volume: 19

Issue: 12

Electronic Location Identifier: 2730

Affiliations

[1 ]College of Intelligence and Computing, Tianjin University, Tianjin 300072, China; jiangweitju@ 123456163.com (W.J.); jinsheng@ 123456tju.edu.cn (J.S.J.); hanxianf@ 123456163.com (X.H.)

[2 ]School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China; licg@ 123456czu.cn

Author notes

[* ]Correspondence: wzheng@ 123456tju.edu.cn ; Tel.: +86-186-2201-2862

[†]

These authors contributed equally to this work.

Article

Publisher ID: sensors-19-02730

DOI: 10.3390/s19122730

PMC ID: 6630663

PubMed ID: 31216650

SO-VID: c1a270e5-7c8c-4787-8d9d-9fd28b317fe5

License:

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network

Read this article at

Abstract

Related collections

Journal of Disability Research

Most cited references 49

IEMOCAP: interactive emotional dyadic motion capture database

Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Extraction of visual features for lipreading

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 67

Cited by 16

Most referenced authors 437