ScienceOpen: research and publishing network

For Researchers

Search
Advanced search

1

views

    

0

recommends

0

shares

Record: found
Abstract: found
Article: found

Is Open Access

Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning

Preprint

Author(s): Alexander H. Liu , Tao Tu , Hung-yi Lee , Lin-shan Lee

Publication date Created: 28 October 2019

Read this article at

ScienceOpen ArXiv

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close to phoneme sequences of speech utterances. This is achieved by proper temporal segmentation to make the representations phoneme-synchronized, and proper phonetic clustering to have total number of distinct representations close to the number of phonemes. Mapping between the distinct representations and phonemes is learned from a small amount of annotated paired data. Preliminary experiments on LJSpeech demonstrated the learned representations for vowels have relative locations in latent space in good parallel to that shown in the IPA vowel chart defined by linguistics experts. With less than 20 minutes of annotated speech, our method outperformed existing methods on phoneme recognition and is able to synthesize intelligible speech that beats our baseline model.

Related collections

Most cited references 6

Record: found
Abstract: not found
Article: not found

Signal estimation from modified short-time Fourier transform

D D Griffin, Jae Lim (1984)

0 comments Cited 226 times – based on 0 reviews      Review now

Record: found
Abstract: not found
Conference Proceedings: not found

Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions

Jonathan Shen, Ruoming Pang, Ron Weiss … (2018)

0 comments Cited 87 times – based on 0 reviews

Record: found
Abstract: not found
Conference Proceedings: not found

wav2vec: Unsupervised Pre-Training for Speech Recognition

Steffen Schneider, Alexei Baevski, Ronan Collobert … (2019)

0 comments Cited 39 times – based on 0 reviews

Author and article information

Journal

Publication date Created: 28 October 2019

Article

ArXiV ID: 1910.12729

SO-VID: 48b41f12-4e2d-42d4-9807-c6f136535115

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments under review ICASSP 2020

Categories cs.CL cs.SD eess.AS

ScienceOpen disciplines: Theoretical computer science,Electrical engineering,Graphics & Multimedia design

Data availability:

ScienceOpen disciplines: Theoretical computer science, Electrical engineering, Graphics & Multimedia design

Comments

Comment on this article