Spectro-temporal acoustical markers differentiate speech from song across cultures

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Humans produce two primary forms of vocal communication: speaking and singing. What is the basis for these two categories? Is the distinction between them based primarily on culturally specific, learned features, or do consistent acoustical cues exist that reliably distinguish speech and song worldwide? Some studies have suggested that important aspects of music can be distinguished from speech based on spectro-temporal modulation patterns, but this conclusion is based on Western music, leaving open the question of whether such a principle may apply more globally. Here, we studied the spectro-temporal modulation patterns of vocalizations produced by 369 people living in 21 urban, rural, and small-scale societies distributed across six continents. We show that specific ranges of spectral and temporal modulations differentiate speech from song in a consistent fashion, and that those ranges overlap within categories and across societies. Machine-learning analyses confirmed that this effect was cross-culturally robust, with vocalizations reliably classified solely from their spectro-temporal modulation patterns across all 21 societies. Listeners unfamiliar with most of the cultures could also classify the vocalizations, with similar accuracy patterns as the machine learning algorithm, indicating that the spectro-temporal cues used by the classifier are similar to those used by human listeners. Thus, the two most basic forms of human vocalization appear to exploit opposite extremes of the spectro-temporal continuum in a consistent fashion across societies. The findings support the idea that the human nervous system is specialized to produce and perceive two distinct ranges of spectro-temporal modulation in the service of the two distinct modes of human vocal communication.