Sandhya Vinay 1, A-F
More details
Hide details
Audiology group, Institute of Neuromedicine and Neurosciences, Norwegian University of Science and Technology, Norway
A - Research concept and design; B - Collection and/or assembly of data; C - Data analysis and interpretation; D - Writing the article; E - Critical revision of the article; F - Final approval of article;
Submission date: 2022-03-24
Final revision date: 2022-05-12
Acceptance date: 2022-05-23
Publication date: 2022-06-30
Corresponding author
Sandhya Vinay   

Audiology group, Institute of Neuromedicine and Neurosciences, Norwegian University of Science and Technology, Tungasletta 2, 7491, Trondheim, Norway
J Hear Sci 2022;12(2):20-35
Speech perception is multisensory, relying on auditory as well as visual information from the articulators. Watching articulatory gestures which are either congruent or incongruent with the speech audio can change the auditory percept, indicating that there is a complex integration of auditory and visual stimuli. A speech segment is comprised of distinctive features, notably voice onset time (VOT) and place of articulation (POA). Understanding the importance of each of these features for audiovisual (AV) speech perception is critical. The present study investigated the perception of AV consonant-vowel (CV) syllables with various VOTs and POAs under two conditions: diotic incongruent and dichotic congruent.

Material and methods:
AV stimuli comprised diotic and dichotic CV syllables with stop consonants (bilabial /pa/ and /ba/; alveolar /ta/ and /da/; and velar /ka/ and /ɡa/) presented with congruent and incongruent video CV syllables with stop consonants. There were 40 righthanded normal hearing young adults (20 females, mean age 23 years, SD = 2.4 years) and 20 males (mean age 24 years, SD = 2.1 years) who participated in the experiment.

In the diotic incongruent AV condition, short VOT (voiced CV syllables) of the visual segments were identified when auditory segments had a CV syllable with long VOT (unvoiced CV syllables). In the dichotic congruent AV condition, there was an increase in identification of the audio segment when the subject was presented with a video segment congruent to either ear, in this way overriding the otherwise presented ear advantage in dichotic listening. Distinct visual salience of bilabial stop syllables had greater visual influence (observed as greater identification scores) than velar stop syllables and thus overrode the acoustic dominance of velar syllables.

The findings of the present study have important implications for understanding the perception of diotic incongruent and dichotic congruent audiovisual CV syllables in which the stop consonants have different VOT and POA combinations. Earlier findings on the effect of VOT on dichotic listening can be extended to AV speech having dichotic auditory segments.

Ross LA, Saint-Amour D, Leavitt VM, et al. Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cereb Cortex, 2007; 17(5): 1147–53.
Sumby WH, Pollack I. Visual contribution to speech intelligibility in noise. J Acoust Soc Am, 1954; 26(2): 212–5.
McGurk H, MacDonald J. Hearing lips and seeing voices. Nature, 1976; 264(5588): 746.
Schwartz J-L, Robert-Ribes J, Escudier P. Ten years after Summerfield: a taxonomy of models for audio-visual fusion in speech perception. In: Hearing by Eye II. Advances in the psychology of speechreading and auditory-visual speech. Psychology Press: Hove, UK, 1998; p. 85–108.
Colin C, Radeau M, Soquet A, et al. Mismatch negativity evoked by the McGurk-MacDonald effect: A phonetic representation within short-term memory. Clin Neurosci, 2002; 113(4): 495–506.
Traunmüller H, Öhrström N. Audiovisual perception of openness and lip rounding in front vowels. J Phonetics, 2007; 35(2): 244–58.
Chomsky N, Halle M. The sound pattern of English. Harper & Row: New York, 1968.
Jakobson R, Fant CGM, Halle M. Preliminaries to speech analysis: The distinctive features and their correlates. MIT Press: Cambridge, MA, 1951.
Hugdahl K. Lateralization of cognitive processes in the brain. Acta Psychol, 2000; 105(2-3): 211–35.
Voyer D. On the magnitude of laterality effects and sex differences in functional lateralities. Laterality, 1996; 1(1): 51–84.
Kimura D. Cerebral dominance and the perception of verbal stimuli. Can J Psychol, 1961; 15(3): 166–71.
Kinsbourne M. The cerebral basis of lateral asymmetries in attention. Acta Psychol, 1970; 33: 193–201.
Bulman-Fleming MB, Bryden MP. Simultaneous verbal and affective laterality effects. Neuropsychologia, 1994; 32(7): 787–97.
Jäncke L, Buchanan TW, Lutz K, Shah NJ. Focused and non-focused attention in verbal and emotional dichotic listening: an FMRI study. Brain Lang, 2001; 78(3): 349–63.
Binder, J. The new neuroanatomy of speech perception. Brain, 2000; 123(12): 2371–2.
Schwartz J, Tallal P. Rate of acoustic change may underlie hemispheric specialization for speech perception. Science, 1980; 207(4437): 1380–1.
Schönwiesner M, Rübsamen R, Von Cramon DY. Hemispheric asymmetry for spectral and temporal processing in the human antero‐lateral auditory belt cortex. Eur J Neurosci, 2005; 22(6): 1521–8.
Tervaniemi M, Hugdahl K. Lateralization of auditory-cortex functions. Brain Res Rev, 2003; 43(3): 231–46.
Zatorre RJ, Belin P. Spectral and temporal processing in human auditory cortex. Cereb Cortex, 2001; 11(10): 946–53.
Nicholl ME. Temporal processing asymmetries between the cerebral hemispheres: evidence and implications. Laterality, 1996; 1(2): 97–137.
Samson S, Ehrlé N, Baulac M. Cerebral substrates for musical temporal processes. Ann NY Acad Sci, 2001; 930(1): 166–78.
Boemio A, Fromm S, Braun A, Poeppel D. Hierarchical and asymmetric temporal sensitivity in human auditory cortices. Nat Neurosci, 2005; 8(3): 389–95.
Poeppel D, Guillemin A, Thompson J, Fritz J, Bavelier D, Braun AR. Auditory lexical decision, categorical perception, and FM direction discrimination differentially engage left and right auditory cortex. Neuropsychologia, 2004; 42(2): 183–200.
Lisker L, Abramson AS. A cross-language study of voicing in initial stops: acoustical measurements. Word, 1964; 20(3): 384–422.
Cutting JE. Two left-hemisphere mechanisms in speech perception. Percept Psychophys, 1974; 16(3): 601–12.
Darwin CJ. Dichotic backward masking of complex sounds. Q J Exp Psychol, 1971; 23(4): 386–92.
Haggard MP. Encoding and the REA for speech signals. Q J Exp Psychol, 1971; 23(1): 34–45.
Hugdahl K, Andersson L. The “forced-attention paradigm” in dichotic listening to CV-syllables: a comparison between adults and children. Cortex, 1986; 22(3): 417–32.
Cohen H. Hemispheric contributions to the perceptual representation of speech sounds. Doctoral dissertation, Concordia University, 1981.
Rimol LM, Eichele T, Hugdahl K. The effect of voice-onset-time on dichotic listening with consonant–vowel syllables. Neuropsychologia, 2006; 44(2): 191–6.
Speaks C, Niccum N, Carney E, Johnson C. Stimulus dominance in dichotic listening. J Speech Lang Hear Res, 1981; 24(3): 430–7.
O’Brien SM. Spectral features of plosives in connected-speech signals. Int J Man Mach Stud, 1993; 38(1): 97–127.
Voyer D, Techentin C. Dichotic listening with consonant–vowel pairs: the role of place of articulation and stimulus dominance. J Phonetics, 2009; 37(2): 162–72.
Scott M. The McGurk effect affected by the right ear advantage. Can Acoust, 2008; 36(3): 156–7.
MacDonald J, McGurk H. Visual influences on speech perception processes. Percept Psychophys, 1978; 24(3): 253–7.
Omata K, Mogi K. Fusion and combination in audio-visual integration. Proc Roy Soc A, 2007; 464(2090): 319–40.
Öhrström N, Traunmüller H. Audiovisual perception of Swedish vowels with and without conflicting cues. In: Proc Fonetik 2004, p. 40–43.
Alm M, Behne D. Voicing influences the saliency of place of articulation in audio-visual speech perception in babble. In: Proc Interspeech 2008, p. 2865–8.
Sandhya, Vinay, Manchaiah V. Perception of incongruent audiovisual speech: distribution of modality-specific responses. Am J Audiol, 2021; 30: 968–79.
Irwin J, DiBlasi L. Audiovisual speech perception: a new approach and implications for clinical populations. Lang Ling Compass, 2017; 11(3): 77–91.
Van Tasell DJ, Greenfield DG, Logemann JJ, Nelson DA. Temporal cues for consonant recognition: training, talker generalization, and use in evaluation of cochlear implants. J Acoust Soc Am, 1992; 92(3): 1247–57.
British Society of Audiology. Recommended procedure: Pure-tone air-conduction and bone-conduction threshold audiometry with and without masking. Reading, UK, 2011.
Strouse Watt W. “How visual acuity is measured,” [Internet] Available at https: //lowvision.preventblindness.org/2003/10/06/how-visual-acuity-is-measured/ Viewed 2022/05/28.
Kristoffersen G. The Phonology of Norwegian. Oxford: Oxford University Press, 2007.
Alm M, Behne D. Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech. J Acoust Soc Am, 2013; 134(4): 3001–10.
Boersma P, Weenink D. Praat: doing phonetics by computer (Version 5.1.05; computer program). Retrieved 2009 May 1.
Sams M, Rusanen S. Integration of dichotically and visually presented speech stimuli. In: Proc AVSP’98, 1998, p. 89–92.
Öhrström N, Arppe H, Eklund L et al. Audiovisual integration in binaural, monaural and dichotic listening. In: Proc Fonetik, 2011, p.29–32.
Alsius A, Navarra J, Campbell R, Soto-Faraco S. Audiovisual integration of speech falters under high attention demands. Curr Biol, 2005; 15(9): 839–43.
Alsius A, Navarra J, Soto-Faraco S. Attention to touch weakens audiovisual speech integration. Exp Brain Res, 2007; 183(3): 399–404.
Tiippana K, Andersen TS, Sams M. Visual attention modulates audiovisual speech perception. Eur J Cogn Psychol, 2004; 16(3): 457–72.
Erber NP. Auditory-visual perception of speech. J Speech Hear Dis, 1975; 40(4): 481–92.
Dodd B, Hermelin B. Phonological coding by the prelinguistically deaf. Percept Psychophys, 1977; 21(5): 413–7.
Alm M, Behne D. Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age? Front Psychol, 2015; 6: 1014.
Rosenblum LD. Audiovisual speech perception and the McGurk effect. In: Oxford Research Encyclopedia of Linguistics. Oxford University Press, 2019.
Brancazio L. Lexical influences in audiovisual speech perception. J Exp Psychol Human, 2004; 30(3): 445–63.
Brancazio L, Best CT, Fowler CA. Visual influences on perception of speech and nonspeech vocal-tract events. Lang Speech, 2006; 49(1): 21–53.
Jerger S, Damian MF, Tye-Murray N, Abdi H. Children perceive speech onsets by ear and eye. J Child Lang, 2017; 44(1): 185–215.
Brancazio L, Miller JL. Use of visual information in speech perception: evidence for a visual rate effect both with and without a McGurk effect. Percept Psychophys, 2005; 67(5): 759–69.
Callan DE, Callan AM, Kroos C, Vatikiotis-Bateson E. Multimodal contribution to speech perception revealed by independent component analysis: a single sweep EEG case study. Cogn Brain Res, 2001; 10: 349–53.
Mottonen R, Krause CM, Tiippana K, Sams M. Processing of changes in visual speech in the human auditory cortex. Cogn Brain Res, 2002; 13: 417–25.
Calvert GA, Campbell R. Reading speech from still and moving faces: the neural substrates of visible speech. J Cogn Neurosci, 2003; 15: 57–70.
Rosenblum LD, Miller RM, Sanchez K. Lipread me now, hear me better later: cross modal transfer of talker familiarity effects. Psychol Sci, 2007; 18: 392–6.
Sanchez K, Dias JW, Rosenblum LD. Experience with a talker can transfer across modalities to facilitate lipreading. Atten Percept Psychophys, 2013; 75(7): 1359–65.
Journals System - logo
Scroll to top