There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
<p class="first" id="d7097841e140">As the primary means of communication, voice plays
an important role in daily life.
Voice also conveys personal information such as social status, personal traits, and
the emotional state of the speaker. Mechanically, voice production involves complex
fluid-structure interaction within the glottis and its control by laryngeal muscle
activation. An important goal of voice research is to establish a causal theory linking
voice physiology and biomechanics to how speakers use and control voice to communicate
meaning and personal information. Establishing such a causal theory has important
implications for clinical voice management, voice training, and many speech technology
applications. This paper provides a review of voice physiology and biomechanics, the
physics of vocal fold vibration and sound production, and laryngeal muscular control
of the fundamental frequency of voice, vocal intensity, and voice quality. Current
efforts to develop mechanical and computational models of voice production are also
critically reviewed. Finally, issues and future challenges in developing a causal
theory of voice production and perception are discussed.
</p>
The automatic conversion of English text to synthetic speech is presently being performed, remarkably well, by a number of laboratory systems and commercial devices. Progress in this area has been made possible by advances in linguistic theory, acoustic-phonetic characterization of English sound patterns, perceptual psychology, mathematical modeling of speech production, structured programming, and computer hardware design. This review traces the early work on the development of speech synthesizers, discovery of minimal acoustic cues for phonetic contrasts, evolution of phonemic rule programs, incorporation of prosodic rules, and formulation of techniques for text analysis. Examples of rules are used liberally to illustrate the state of the art. Many of the examples are taken from Klattalk, a text-to-speech system developed by the author. A number of scientific problems are identified that prevent current systems from achieving the goal of completely human-sounding speech. While the emphasis is on rule programs that drive a format synthesizer, alternatives such as articulatory synthesis and waveform concatenation are also reviewed. An extensive bibliography has been assembled to show both the breadth of synthesis activity and the wealth of phenomena covered by rules in the best of these programs. A recording of selected examples of the historical development of synthetic speech, enclosed as a 33 1/3-rpm record, is described in the Appendix.
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.