Postscript: Association for Research in Otolaryngology – 2018

By: Dr. Mark Pflieger, Senior Scientist

I attended the 41st Annual MidWinter Meeting of the Association for Research in Otolaryngology (ARO, February 10-14, 2018) in San Diego as a non-presenting participant with an intention to update my mental map of the field of human auditory neuroscience.  And it happened, thanks mostly to interactions in the poster aisles—both planned and serendipitous—but also to several insightful talks from the podium.  Some preconceived ideas I came with on Saturday had changed substantially by Wednesday.

My job is to develop analysis software for brain researchers who use noninvasive neurophysiological data (EEG and MEG) in the context of neuroimaging data (MRI).  Consequently, ARO presentations incorporating such data pulled me in.

Matching methods to questions

The following principle was a helpful guide for updating of my mental map: Data analysis methods must match the science questions that they (the methods) are supposed to address.  Thus, I sought “resonances” between questions and methods.  In particular, I attended especially to sweet spots where science algorithms resonate with brain algorithms—that is, where the algorithms we scientists may use to process brain data resemble the algorithms a brain may use to process sensory data.

At an abstract level, we can think of the auditory system as a speech signal processing system—a multi-level system with “bottom-up”, “top-down”, and “middle-out” pathways that process sound signal streams to optimize intelligibility of speech under sub-optimal conditions.   For example, speech stream intelligibility can be degraded by vocoding, a process that partly mimics distortion due to cochlear implants.  Also, additional speech streams from nearby sources can compete with the processing of a target stream.  On top of that, various kinds of noise (such as incoherent multi-speaker babble) can be added to the total sound stream.  How does the auditory system succeed (when it does) at understanding a particular stream of speech under such conditions?

Top-down processing is certainly involved.  For example, we can consciously tune in or tune out selected speech streams at a cocktail party.  But top-down and bottom-up processes interact [1], as Nina Kraus demonstrated Sunday evening at a special ARO-sponsored event, “Sounds of Music: Symphony meets Science”.  Like these examples:

from [2], she presented the same sentence three times: first, a distorted presentation that (for most listeners) is unintelligible; second, an undistorted, clearly intelligible presentation; and third, an exact repeat of the first.  Remarkably, the third presentation is completely intelligible.  Even though physically identical with the first, it doesn’t “sound” the same.  How did the auditory system dynamically retune itself in the brief time from the first to the third presentations?  And at what levels (cochlear nucleus, superior olive, lateral lemniscus, inferior colliculus, medial geniculate body, auditory cortical areas, and speech-related cortical areas [3]) did these changes occur?

Tapping different auditory processing levels simultaneously

To address questions like these, we need methods that can tap different levels of the auditory system simultaneously.

  • Lei Feng (University of Minnesota, Oxenham lab) reviewed the phenomenon of auditory enhancement whereby a target sound becomes more audible in the presence of masking noise when the latter is presented first, by itself, thereby possibly enabling the auditory system to adapt to rapidly changing acoustic backgrounds against which target events may ‘pop out’ [4].  The lab employs a double-modulation paradigm with rapidly sampled 64-channel EEG to assess both cortical and subcortical contributions to the auditory enhancement effect.
  • Lee Miller (University of California, Davis) introduced a novel class of spectro-temporal stimuli comprising chirps in speech (called ‘CHEECH’) that are used—all at once in a single session–to rapidly obtain ABRs (Wave V), medial geniculate responses, auditory steady state responses, P1-N1 complex responses, and N400 linguistic surprisal.
  • Octave Etard (University College, London; Reichenbach lab) presented a method for estimating auditory brainstem responses to natural, continuous speech [7] based on a method for determining the fundamental frequency of the speaker [6].  The method is related to an earlier approach of Lalor and Foxe [5] that estimates the impulse response function of speech, and a method of Maddox and Lee published this month [8]. With concurrent measurement of cortical and subcortical auditory responses from scalp EEG, the next step is to perform joint analyses.  The lab of Gavin Bidelman (University of Memphis) presented nine high-resonance posters, some of which combined the analysis of subcortical responses with cortical responses, using source estimation methods to analyze the latter.  Gavin is setting the stage for causal information flow analysis between different brain areas (e.g., Granger causality), including cortical-subcortical directional interactions.

Binaural Sound localization, perceptual fusion and binaural interaction

Binaural sound localization, perceptual fusion, and the nonlinear binaural interaction component (BIC) was another hot topic.

  • In a nicely designed study with Mongolian gerbil, Sandra Tolnai (University of Oldenberg)  showed that that the BIC estimated from the auditory brainstem response has a substantial contribution from the output (but not input) of units in the lateral superior olive, but a negligible contribution from the input or output of units in the medial superior olive.
    Melissa Polonenko (University of Toronto and Hospital for Sick Children) studied behavioral detection of interaural timing differences (ITDs) and associated auditory brainstem responses in ‘bimodal’ children with a cochlear implant device on one side and a hearing aid on the non-implanted side.  She concluded that timing differences between the devices impact brainstem processing of ITDs and the normal ability to fuse auditory information from both ears—raising the possibility that device timing delays may be tuned to improve binaural fusion.
  • A related topic is attentional modulation of auditory stream segregation versus fusion in listeners with cortical implants (Andreu Paredes-Gallardo, Technical University of Denmark) and ERP correlates of switching during auditory streaming of bistable stimuli that are perceived, intermittently, as one or two streams (Nate Higgins, University of Nevada, Las Vegas; Snyder lab).

Another big topic was multimodal sensory and/or motor integration incorporating the auditory modality.  For example, what are tolerances for perceptual fusion considering audiovisual synchronization leads and lags (Tony Shahine, University of California, Davis) [9]?  Another example was motor modulation of auditory perception: Auditory stimuli initiated with a button press are perceived as louder than same intensity stimuli passively presented (John Myers, Jeff Mock, Ed Golub; University of Texas, San Antonio).  A more complicated example in natural environments is oculomotor orienting of vision toward the perceived location of a sound source.  For example, eye tracking may be used to control a binaural beamformer to improve speech intelligibility in a multi-speaker (cocktail party) environment (Ľuboš Hládek, Glasgow University).

The last example is a clear case of a sweet spot: Beamformers are an important class of algorithms we use to “tune in” to sources of brain activity from regions of interest given multi-channel MEG or EEG data.  The auditory system itself may use beamformer-like algorithms to tune the auditory system based on where the listener is looking.  And perhaps eye-gaze-plus-beamformer technology may be used to enhance the brain’s own algorithm.

Many other examples could be adduced, but I’ll stop here.  For me, ARO 2018 may have stimulated an idea or two worth pursuing before ARO 2019 in Baltimore.


[1] Anderson S, Kraus N. Sensory-cognitive interaction in the neural encoding of speech in noise: a review. J Am Acad Audiol. 2010 Oct;21(9):575–85.

[2] Holdgraf CR, de Heer W, Pasley B, Rieger J, Crone N, Lin JJ, et al. Rapid tuning shifts in human auditory cortex enhance speech intelligibility. Nat Commun. 2016 Dec 20;7:13654.

[3] Covey E. Sound Localization and Temporal Pattern Analysis. In: Sensory and Perceptual Processes (PSYCH 333, U Washington) [Internet]. Available here.

[4] Feng L, Oxenham AJ. New perspectives on the measurement and time course of auditory enhancement. J Exp Psychol Hum Percept Perform. 2015 Dec;41(6):1696–708.

[5] Lalor EC, Foxe JJ. Neural responses to uninterrupted natural speech can be extracted with precise temporal resolution. Eur J Neurosci. 2010 Jan;31(1):189–93.

[6] Forte AE, Etard O, Reichenbach T. The human auditory brainstem response to running speech reveals a subcortical mechanism for selective attention. Elife [Internet]. 2017 Oct 10;6. Available from:

[7] Etard O, Kegler M, Braiman C, Forte AE, Reichenbach T. Real-time decoding of selective attention from the human auditory brainstem response to continuous speech. bioRxiv. 2018 Feb 5;259853.

[8] Maddox RK, Lee AKC. Auditory Brainstem Responses to Continuous Natural Speech in Human Listeners. eNeuro [Internet]. 2018 Feb;5(1). Available from:

[9] Bhat J, Miller LM, Pitt MA, Shahin AJ. Putative mechanisms mediating tolerance for audiovisual stimulus onset asynchrony. J Neurophysiol. 2015 Mar 1;113(5):1437–50.