The McGurk Effect: When What You See Changes What You Hear

Kevin Wuryanto
Mar 31
9 min read

We trust our ears to tell us the truth. Sometimes, it’s our eyes that decide what we think we’ve heard.

Imagine you’re standing across the road from a friend. Cars are passing, engines are loud, and the noise swallows most of what she’s saying. You see her mouth open wide, her face tense, her body leaning forward. Even before you fully hear her voice, your brain already assumes she’s shouting. In fact, you almost hear it, just from the way she looks.

❓ But what if your eyes could actually change what your ears hear?

Welcome to the McGurk effect, where sight and sound collide and perception isn’t always what it seems.

Chapter 1 — When Seeing Changes What We Hear

We like to think our senses work independently: our eyes see, our ears hear, and our brain simply reports what’s happening. In reality, perception is far more active. The brain constantly combines information from different senses to make sense of the world. [1]

Speech is a good example. When someone talks, your brain doesn’t rely on sound alone. It also uses visual cues like lip movement and facial expression. These signals arrive together and are merged automatically, often without us noticing. [2],[3]

Most of the time, this integration helps. In noisy environments, seeing a speaker’s face can make their words clearer. [4],[5] But when visual and auditory information conflict, the brain doesn’t ignore one of them. [6] Instead, it creates a new perception that feels most plausible. This process reveals something important: what we hear is not just sound entering the ears. It is a construction shaped by both hearing and vision, and sometimes, what we see can change what we hear. [7],[8]

Chapter 2 — The McGurk Effect Explained

The McGurk Effect is a perceptual illusion that demonstrates how tightly connected our senses are, especially when it comes to speech. It shows that what we hear is not determined by sound alone, but by a combination of auditory and visual information processed together by the brain. [1]

💡 It was first described in 1976 in a landmark paper by psychologists Harry McGurk and John MacDonald, titled “Hearing Lips and Seeing Voices,” published in Nature on 23 December 1976. In their study, participants were shown videos of a person speaking while the audio track played a different speech sound. When the visual and auditory information conflicted, participants consistently reported hearing a sound that was not actually present in the audio recording. [7],[9]

The most well-known example works like this.

A person hears the sound “ba” while watching a video of a speaker whose lips are clearly forming the sound “ga.” Instead of hearing either “ba” or “ga,” many people report hearing “da” or a similar intermediate sound. The brain, faced with conflicting information, does not choose one sense over the other. Instead, it blends the two into a single percept that feels natural and convincing.

What makes the McGurk Effect so striking is that it happens automatically. Even when people are told in advance that the sound and video do not match, the illusion often persists. This shows that audiovisual integration in speech perception operates at a level below conscious awareness. We cannot simply “decide” to hear the correct sound once the brain has combined the inputs. [11]

The McGurk Effect also highlights that speech perception is not a passive process. The brain actively interprets incoming signals, using prior knowledge, expectations, and sensory cues to construct meaning. Visual information about mouth movements provides strong predictions about what sound is likely to be heard. [4],[5] When those predictions conflict with the auditory signal, the brain resolves the mismatch by creating a compromise rather than rejecting one source of information entirely. [10],[11]

Chapter 3 — Why the Brain Does This

Speech perception does not rely on a single brain area working in isolation. Instead, it emerges from the interaction of multiple regions that process different types of information and rapidly combine them into a unified percept.

Audiovisual Integration in Speech Perception [12]

When someone speaks, sound and visual information initially enter the brain through separate pathways: [13],[14]

Speech sounds → Auditory regions of the temporal lobe (including primary auditory cortex)

Lip and facial movements → Visual cortex (occipital lobe)

These parallel streams do not remain separate for long.

A key site where auditory and visual speech information converge is the superior temporal sulcus (STS):

💡 Auditory input + visual input → STS (audiovisual integration)

The STS plays a central role in audiovisual speech perception. When auditory and visual cues are congruent, activity in this region supports a stable and effortless perception of speech. The brain’s predictions are confirmed, and communication feels seamless. [14]

In the McGurk Effect, the inputs conflict: What the ears hear ≠ what the eyes see

In this situation, the STS does not simply discard one signal in favor of the other. Instead, it integrates both sources of information and generates a percept that best resolves the mismatch. The resulting sound feels real to the listener, even though it does not correspond exactly to either the auditory or visual input alone.

This integrative process is fast and largely unconscious. Evidence from neuroimaging and brain stimulation studies shows that the STS contributes to this process within a narrow time window around the onset of the speech sound, indicating that audiovisual integration occurs early, before conscious awareness of what is heard. [14]

The superior temporal sulcus (STS) dynamically shifts its connectivity depending on whether auditory or visual speech cues are more reliable, supporting flexible multisensory integration. [15]

Other brain regions help shape this process. Frontal cortical areas contribute top-down influences, using visual cues from the speaker’s face to generate predictions about upcoming speech sounds and to selectively enhance relevant sensory information. At the same time, auditory and visual regions can modulate one another, reflecting a dynamic and bidirectional exchange rather than a one-way flow of information. [15],[16]

From a functional perspective, this strategy is highly adaptive. In everyday environments, speech signals are often degraded by noise, distance, or competing voices. Visual cues from the speaker’s mouth are frequently more reliable under these conditions. By flexibly weighting sensory inputs based on their reliability, the brain increases the likelihood of successful communication, even if this occasionally leads to perceptual illusions. [10]

The McGurk Effect emerges when this normally efficient system is placed in an artificial situation involving conflicting audiovisual cues. Once integration has occurred, the final percept feels like a direct reflection of reality. In fact, it is the product of a neural system optimized not for perfect sensory accuracy, but for extracting meaning and supporting effective communication.

Chapter 4 — The McGurk Effect in Everyday Life

Although the McGurk Effect is often demonstrated in laboratory videos, the same process operates constantly in everyday communication. We rarely notice it because, in natural settings, visual and auditory information usually align. When they do, the brain’s integration system works quietly in the background.

One common example is conversation in noisy environments. In places like cafés, classrooms, or hospital wards, background noise often masks parts of speech. Seeing a speaker’s face helps the brain predict and interpret sounds more accurately. Without realizing it, we rely on lip movements to understand what is being said.

Neural studies show that seeing a speaker’s mouth does more than add extra information—it actively strengthens the brain’s representation of the attended speech stream, making it more resistant to noise and interruption. [17]

Video calls provide another clear illustration. When audio and video are slightly out of sync, conversations suddenly feel uncomfortable or confusing. Even if the sound is clear, delayed or frozen facial movements disrupt the brain’s timing and prediction cues. The result is increased effort and reduced comprehension.

Studies show that the brain does not require perfect synchrony to integrate audiovisual speech. When speech signals are clear and informative, the brain widens its temporal window of integration, allowing delayed audio and visual cues to be fused into a single percept. This flexibility explains why small delays in video calls are often tolerated—and why larger delays suddenly feel uncomfortable. [18]

🎬 Ever notice how dubbed movies feel wrong? Your brain registers mismatched lips and voices as conflicting sensory information.

This effect also explains why poorly dubbed movies feel unnatural. When mouth movements do not match the spoken dialogue, the brain struggles to integrate the signals, making the speech feel “off.” The discomfort comes from the same sensory integration system revealed by the McGurk Effect.

A more familiar example emerged during the widespread use of face masks. Many people noticed that conversations became harder, even when hearing was not impaired. With visual cues removed, the brain had less information to work with. The difficulty highlighted how much we normally depend on seeing speech to understand it. [19]

These everyday experiences show that the McGurk Effect reflects a perceptual mechanism we use constantly, rather than a rare or artificial illusion. Vision and hearing are deeply interconnected, and together they shape how we experience spoken language, usually to our benefit, and occasionally in ways that reveal the hidden workings of the brain.

Chapter 5 — What the McGurk Effect Reveals About Perception

The McGurk Effect shows that perception is not a direct recording of the world. What we hear is the result of the brain integrating sound with visual information to create a meaningful interpretation.

Most of the time, this process works in our favor. By combining vision and hearing, the brain allows us to understand speech efficiently, even in noisy or imperfect conditions. The illusion only appears when these signals are deliberately mismatched.

However, understanding what the McGurk Effect reveals also requires acknowledging what it cannot fully explain. In everyday life, speech is rarely reduced to isolated syllables with mismatched visual cues. Natural communication relies on words, sentences, and continuous visual–auditory alignment. Research has shown that sensitivity to the McGurk illusion does not necessarily predict how well someone understands speech in real-world audiovisual settings. In this sense, the McGurk Effect is best viewed as an experimental tool that exposes the brain’s integration mechanisms, rather than a complete model of everyday speech perception. [20]

What the McGurk Effect ultimately reveals is that our senses do not operate independently or objectively. They cooperate to build what feels like reality. Sometimes, what we experience is not what was physically present, but what the brain judged to be the most plausible interpretation.

The illusion reminds us that perception is not about faithfully reproducing the world as it is, but about constructing a reality that is functional enough to support understanding, interaction, and communication.

"Now that you know how it works, why not try it yourself and see what you hear?"

https://www.youtube.com/watch?v=PWGeUztTkRA

This article was written by Kevin Julio Davis Wuryanto, a medical writer at the MedReport Foundation and a clinical clerkship student from Indonesia. He is particularly interested in the application of healthcare in remote settings and the often-overlooked medical phenomena that shape everyday life.

References

Cappe C, Thut G, Romei V, Murray MM. Auditory-Visual Multisensory Interactions in Humans: Timing, Topography, Directionality, and Sources. Journal of Neuroscience. 2010 Sep 22;30(38):12572–80.
Matyjek M, Kita S, Mireia Torralba Cuello, Salvador Soto Faraco. Multisensory integration of speech and gestures in a naturalistic paradigm. Human Brain Mapping. 2024 Jul 23;45(11).
Navarra J, Yeung H, Werker J, Soto-Faraco S. Q Stein-The New Handbook of Multisensory Processes 24 Multisensory Interactions in Speech Perception.
Drijvers L, Aslı Özyürek. Visible speech enhanced: What do iconic gestures and lip movements contribute to degraded speech comprehension? MPGPuRe (Max Planck Society). 2016 Jan 1.
Battista M, Collesei F, Orzan E, Fantoni M, Bottari D. Lip-Reading: Advances and Unresolved Questions in a Key Communication Skill. Audiology Research. 2025 Jul 21;15(4):89.
Van Wassenhove V. Speech through ears and eyes: interfacing the senses with the supramodal brain. Frontiers in Psychology. 2013;4.
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–748.
Opoku-Baah C, Schoenhaut AM, Vassall SG, Tovar DA, Ramachandran R, Wallace MT. Visual Influences on Auditory Behavioral, Neural, and Perceptual Processes: A Review. Journal of the Association for Research in Otolaryngology. 2021 May 20;22(4):365–86.
MacDonald J. Hearing Lips and Seeing Voices: the Origins and Development of the “McGurk Effect” and Reflections on Audio–Visual Speech Perception Over the Last 40 Years. Multisensory Research. 2018;31(1-2):7–18.
King AJ, Calvert GA. Multisensory integration: perceptual grouping by eye and ear. Current Biology. 2001 Apr 17;11(8):R322–5.
Hong S. A Review of audiovisual speech perception based on McGurk effect: evidences and models. University of South Wales . 2017.
McGurk Effect [Internet]. Ualberta.ca. 2024. Available from: http://www.psych.ualberta.ca/~varn/Psyco/C7_MCGURK%20Folder/body03.html
Beauchamp MS, Nath AR, Pasalar S. fMRI-Guided Transcranial Magnetic Stimulation Reveals That the Superior Temporal Sulcus Is a Cortical Locus of the McGurk Effect. Journal of Neuroscience. 2010 Feb 17;30(7):2414–7.
Muge Ozker, Yoshor D, Beauchamp MS. Frontal cortex selects representations of the talker’s mouth to aid in speech perception. eLife. 2018 Feb 27;7.
Nath AR, Beauchamp MS. Dynamic Changes in Superior Temporal Sulcus Connectivity during Perception of Noisy Audiovisual Speech. Journal of Neuroscience. 2011 Feb 2;31(5):1704–14.
Jones JA, Callan DE. Brain activity during audiovisual speech perception: An fMRI study of the McGurk effect. NeuroReport. 2003 Jun;14(8):1129–33.
Jaha N, Shen S, Kerlin JR, Shahin AJ. Visual Enhancement of Relevant Speech in a “Cocktail Party.” Multisensory Research. 2020 Jul 1;33(3):277–94.
Shahin AJ, Shen S, Kerlin JR. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker’s mouth movements and speech. Language, Cognition and Neuroscience. 2017 Feb 6;32(9):1102–18.
Moon IJ, Jo M, Kim GY, Kim N, Cho YS, Hong SH, et al. How Does a Face Mask Impact Speech Perception? Healthcare. 2022 Sep 7;10(9):1709.
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. The Journal of the Acoustical Society of America. 2022 Dec;152(6):3216–25.

Assessed and Endorsed by the MedReport Medical Review Board