7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network

      research-article
      1 , 1 , * , , 1 , 2
      PLOS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Long short-term memory (LSTM) has been effectively used to represent sequential data in recent years. However, LSTM still struggles with capturing the long-term temporal dependencies. In this paper, we propose an hourglass-shaped LSTM that is able to capture long-term temporal correlations by reducing the feature resolutions without data loss. We have used skip connections in non-adjacent layers to avoid gradient decay. In addition, an attention process is incorporated into skip connections to emphasize the essential spectral features and spectral regions. The proposed LSTM model is applied to speech enhancement and recognition applications. The proposed LSTM model uses no future information, resulting in a causal system suitable for real-time processing. The combined spectral feature sets are used to train the LSTM model for improved performance. Using the proposed model, the ideal ratio mask (IRM) is estimated as a training objective. The experimental evaluations using short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) have demonstrated that the proposed model with robust feature representation obtained higher speech intelligibility and perceptual quality. With the TIMIT, LibriSpeech, and VoiceBank datasets, the proposed model improved STOI by 16.21%, 16.41%, and 18.33% over noisy speech, whereas PESQ is improved by 31.1%, 32.9%, and 32%. In seen and unseen noisy situations, the proposed model outperformed existing deep neural networks (DNNs), including baseline LSTM, feedforward neural network (FDNN), convolutional neural network (CNN), and generative adversarial network (GAN). With the Kaldi toolkit for automated speech recognition (ASR), the proposed model significantly reduced the word error rates (WERs) and reached an average WER of 15.13% in noisy backgrounds.

          Related collections

          Most cited references38

          • Record: found
          • Abstract: found
          • Article: not found

          Long Short-Term Memory

          Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              An Experimental Study on Speech Enhancement Based on Deep Neural Networks

                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: InvestigationRole: Supervision
                Role: Funding acquisitionRole: InvestigationRole: MethodologyRole: SoftwareRole: ValidationRole: Writing – original draft
                Role: MethodologyRole: Project administrationRole: SoftwareRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: Validation
                Role: Editor
                Journal
                PLoS One
                PLoS One
                plos
                PLOS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2024
                3 January 2024
                : 19
                : 1
                : e0291240
                Affiliations
                [1 ] School of Information Science and Engineering, NingboTech University, Ningbo, China
                [2 ] Department of Electrical Engineering, University of Engineering and Technology, Peshawar, Pakistan
                Menoufia University, EGYPT
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0003-2864-2809
                https://orcid.org/0000-0002-5532-9175
                Article
                PONE-D-23-08666
                10.1371/journal.pone.0291240
                10763955
                38170703
                b464cb3a-d7eb-44c1-b4b0-b77912971212
                © 2024 Li et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 23 March 2023
                : 25 August 2023
                Page count
                Figures: 6, Tables: 10, Pages: 19
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100007834, Natural Science Foundation of Ningbo;
                Award ID: Talent Introduction Fund Project of Ningbo Tech University under grant no 20211009
                Award Recipient :
                This work is supported by“ Talent Introduction Fund Project of Ningbo Tech University under grant no 20211009. The funder Dr Li is the main author of this work and he contributed fully to this work in the way of simulations and original paper writing. He is the project leader of the “Talent Introduction Fund Project of Ningbo Tech University under grant no 20211009."
                Categories
                Research Article
                Social Sciences
                Linguistics
                Speech
                Engineering and Technology
                Signal Processing
                Speech Signal Processing
                Computer and Information Sciences
                Neural Networks
                Recurrent Neural Networks
                Biology and Life Sciences
                Neuroscience
                Neural Networks
                Recurrent Neural Networks
                Computer and Information Sciences
                Information Theory
                Background Signal Noise
                Engineering and Technology
                Signal Processing
                Background Signal Noise
                Physical Sciences
                Mathematics
                Statistics
                Statistical Noise
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Biology and Life Sciences
                Cell Biology
                Cellular Types
                Animal Cells
                Neurons
                Biology and Life Sciences
                Neuroscience
                Cellular Neuroscience
                Neurons
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Deep Learning
                Custom metadata
                All relevant data are within the paper.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article