30
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited

      , ,
      Multisensory Research
      Brill

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.

          Related collections

          Most cited references116

          • Record: found
          • Abstract: found
          • Article: not found

          Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments.

          Viewing a speaker's articulatory movements substantially improves a listener's ability to understand spoken words, especially under noisy environmental conditions. It has been claimed that this gain is most pronounced when auditory input is weakest, an effect that has been related to a well-known principle of multisensory integration--"inverse effectiveness." In keeping with the predictions of this principle, the present study showed substantial gain in multisensory speech enhancement at even the lowest signal-to-noise ratios (SNRs) used (-24 dB), but it was also evident that there was a "special zone" at a more intermediate SNR of -12 dB where multisensory integration was additionally enhanced beyond the predictions of this principle. As such, we show that inverse effectiveness does not strictly apply to the multisensory enhancements seen during audiovisual speech perception. Rather, the gain from viewing visual articulations is maximal at intermediate SNRs, well above the lowest auditory SNR where the recognition of whole words is significantly different from zero. We contend that the multisensory speech system is maximally tuned for SNRs between extremes, where the system relies on either the visual (speech-reading) or the auditory modality alone, forming a window of maximal integration at intermediate SNR levels. At these intermediate levels, the extent of multisensory enhancement of speech recognition is considerable, amounting to more than a 3-fold performance improvement relative to an auditory-alone condition.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex.

            Integrating information from the different senses markedly enhances the detection and identification of external stimuli. Compared with unimodal inputs, semantically and/or spatially congruent multisensory cues speed discrimination and improve reaction times. Discordant inputs have the opposite effect, reducing performance and slowing responses. These behavioural features of crossmodal processing appear to have parallels in the response properties of multisensory cells in the superior colliculi and cerebral cortex of non-human mammals. Although spatially concordant multisensory inputs can produce a dramatic, often multiplicative, increase in cellular activity, spatially disparate cues tend to induce a profound response depression. Using functional magnetic resonance imaging (fMRI), we investigated whether similar indices of crossmodal integration are detectable in human cerebral cortex, and for the synthesis of complex inputs relating to stimulus identity. Ten human subjects were exposed to varying epochs of semantically congruent and incongruent audio-visual speech and to each modality in isolation. Brain activations to matched and mismatched audio-visual inputs were contrasted with the combined response to both unimodal conditions. This strategy identified an area of heteromodal cortex in the left superior temporal sulcus that exhibited significant supra-additive response enhancement to matched audio-visual inputs and a corresponding sub-additive response to mismatched inputs. The data provide fMRI evidence of crossmodal binding by convergence in the human heteromodal cortex. They further suggest that response enhancement and depression may be a general property of multisensory integration operating at different levels of the neuroaxis and irrespective of the purpose for which sensory inputs are combined.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Polysensory interactions along lateral temporal regions evoked by audiovisual speech.

              Many socially significant biological stimuli are polymodal, and information processing is enhanced for polymodal over unimodal stimuli. The human superior temporal sulcus (STS) region has been implicated in processing socially relevant stimuli--particularly those derived from biological motion such as mouth movements. Single unit studies in monkeys have demonstrated that regions of STS are polysensory--responding to visual, auditory and somato-sensory stimuli, and human neuroimaging studies have shown that lip-reading activates auditory regions of the lateral temporal lobe. We evaluated whether concurrent speech sounds and mouth movements were more potent activators of STS than either speech sounds or mouth movements alone. In an event-related fMRI study, subjects observed an animated character that produced audiovisual speech and the audio and visual components of speech alone. Strong activation of the STS region was evoked in all three conditions, with greatest levels of activity elicited by audiovisual speech. Subsets of activated voxels within the STS region demonstrated overadditivity (audiovisual > audio + visual) and underadditivity (audiovisual < audio + visual). These results confirm the polysensory nature of STS region and demonstrate for the first time that polymodal interactions may both potentiate and inhibit activation.
                Bookmark

                Author and article information

                Journal
                Multisensory Research
                Brill
                2213-4794
                2213-4808
                2018
                2018
                2018
                2018
                : 31
                : 1-2
                : 111-144
                Article
                10.1163/22134808-00002565
                31264597
                612df219-4591-49f4-b0c2-71eed1c8db3a
                © 2018
                History

                Comments

                Comment on this article