A Unified Model of Categorical Effects in Consonant and Vowel Perception

Saturday, February 16, 2013
Auditorium/Exhibit Hall C (Hynes Convention Center)
Yakov Kronrod , University of Maryland, College Park, MD
Emily Coppess , University of Chicago, Chicago, IL
Naomi H. Feldman , University of Maryland, College Park, MD
Background

Phonetic categories affect our perception of speech sounds. The degree of these effects is seen by considering identification and discrimination along a continuum between two sounds. The strongest effects are found for stop consonants, where people tell sounds apart across a category boundary, but struggle to detect within-category differences (Categorical Perception). With vowels, perceived much more continuously, people had an increased ability to detect within-category differences and the shift in identification between categories was more shallow (Perceptual Magnet Effect). We contend that all categorical effects can be explained with a unified probabilistic Bayesian model, with degree of categorical effects corresponding to a single parametric variation within the model.

Methods

The model we use has two sources of variability in the speech signal – meaningful variability from within the category and perceptual and acoustic noise. The listener uses optimal Bayesian inference to weight the fine acoustic detail and underlying category centers to determine the intended production. We examine the model fit to behavioral identification and discrimination data for vowels, stop consonants, and fricatives. This fit gives us parameters including category means, category variances, and noise variance. We examine the ratio of these two variances for different phoneme classes and correlate this with degree of categoricity.

Results

We consider two metrics of performance: model fit and predictive power for degree of Categoricity. First, we find that our model provides very good fits to behavioral data for all three phoneme classes, with fitted categories matching closely human production data. Second, we find a systematic relationship between the meaningful:noise variance ratio and the degree of observed categorical effects on perception. From most categorical to least categorical we got ratio values of 0.17, 1.90, and 6.69 for voiced stop consonants, fricatives (average), and vowels, respectively. These findings fit perfectly with previous work which was inconclusive about the categorical influence on perception of fricatives, but clear categorical effects for stop consonants, and roughly continuous perception of vowels.

Conclusions

Our simulations show that perception of vowels, fricatives, and stop consonant can be captured by a unified model. The degree of categorical effects on these various phonemes is explained as parametric variation within a single framework. In the spirit of Occam’s Razor, we now explain with one phenomenon what before needed two.