In a new study published in the June 6 open-access journal PLOS Biology, a team of researchers has shed light on the remarkable processes employed by the human brain to process speech while in a cacophony of competing voices, often referred to as the “cocktail effect.” Led by Columbia University's Nima Mesgarani, PhD, their findings unveil how the brain tackles the intricate task of comprehending speech in a crowded room, considering factors such as auditory clarity and the listener's focus—with possible implications that could pave the way for advancements in hearing aid processing aimed at isolating and enhancing speech in noise.

Nima Mesgarani, PhD.
Nima Mesgarani, PhD.

Engaging in conversation within a noisy environment can pose significant challenges, particularly when the volume of competing voices overshadows the desired speech signals. Although hearing aids have made great strides in helping people hear better and more comfortably in noise, their ability to identify and attend to the speech of a specific conversation partner in these settings is still mostly confined to directional microphone or beamforming technology.

To delve deeper into the mechanisms underlying speech perception in noisy environments, the team of researchers recorded neural activity from electrodes implanted in the brains of individuals with epilepsy who were undergoing brain surgery. Throughout the procedure, the patients were instructed to concentrate on a single voice, intermittently subjected to varying degrees of volume in relation to other voices, serving as a "glimpsed" or a "masked" counterpart, respectively.

The so-called "cocktail effect" is the ability to focus your attention on a particular speaker while filtering out other competing voices or background noise. Illustration courtesy of Zuckerman Institute, Columbia University.
The so-called "cocktail effect" is the ability to focus your attention on a particular speaker while filtering out other competing voices or background noise. Illustration courtesy of Zuckerman Institute, Columbia University.

By harnessing the neural recordings, the research team constructed predictive models of brain activity. The models showed that phonetic information of “glimpsed” speech was encoded in both the primary and secondary auditory cortex of the brain, and that encoding of the attended speech was enhanced in the secondary cortex. In contrast, phonetic information of “masked” speech was only encoded if it was the attended voice. They also found that a particular glimpse signal-to-ratio (SNR) threshold (−4 dB) optimized neural response predictions.

Moreover, the study revealed a temporal discrepancy between the encoding of "glimpsed" and "masked" speech, with the latter exhibiting a delayed response. The separate encoding of phonetic information from "glimpsed" and "masked" speech implies that focusing on deciphering only the “masked” portion of attended speech could lead to improved auditory attention-decoding systems for brain-controlled hearing aids.

“When listening to someone in a noisy place, your brain recovers what you missed when the background noise is too loud,” says study lead-author Vinay Raghavan. “Your brain can also catch bits of speech you aren’t focused on, but only when the person you’re listening to is quiet in comparison.”

ReSound image

It is hoped the work may help unravel the complex nature of speech in noise and open promising avenues for future advancements in hearing aid technology.

Original article citation: Raghavan VS, O’Sullivan J, Bickel S, Mehta AD, Mesgarani N. Distinct neural encoding of glimpsed and masked speech in multitalker situations. PLoS Biol. 2023;21(6):e3002128.

Source: EurekAlert