1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning

      Preprint
      , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close to phoneme sequences of speech utterances. This is achieved by proper temporal segmentation to make the representations phoneme-synchronized, and proper phonetic clustering to have total number of distinct representations close to the number of phonemes. Mapping between the distinct representations and phonemes is learned from a small amount of annotated paired data. Preliminary experiments on LJSpeech demonstrated the learned representations for vowels have relative locations in latent space in good parallel to that shown in the IPA vowel chart defined by linguistics experts. With less than 20 minutes of annotated speech, our method outperformed existing methods on phoneme recognition and is able to synthesize intelligible speech that beats our baseline model.

          Related collections

          Most cited references6

          • Record: found
          • Abstract: not found
          • Article: not found

          Signal estimation from modified short-time Fourier transform

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              wav2vec: Unsupervised Pre-Training for Speech Recognition

                Bookmark

                Author and article information

                Journal
                28 October 2019
                Article
                1910.12729
                48b41f12-4e2d-42d4-9807-c6f136535115

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                under review ICASSP 2020
                cs.CL cs.SD eess.AS

                Theoretical computer science,Electrical engineering,Graphics & Multimedia design

                Comments

                Comment on this article