Seeing Influences on Verbal Variation

Sunday, 15 February 2015: 1:00 PM-2:30 PM
Room LL21E (San Jose Convention Center)
John Nerbonne,University of Groningen, Groningen, Netherlands
Modern information processing enables the examination of linguistic variation in samples requiring 108 and more comparisons of individual sound segments, such as /a/ vs. /o/.  This frees researchers from the need to focus narrowly on a small set of contrasts, enabling larger scale aggregate comparisons, with a number of advantages (Nerbonne 2009). We note a visual form for the need for aggregate analyses using Tufte’s “small multiples” as a graphic technique – showing how differently individual contrasts are often distributed, in order to then take up the dialectological question as to how geography influences variation, and more particularly when this influence results in “dialect areas” and the nature of those “areas”, using (noisy) clustering to identify natural groups in the data, visualized by (consensus) dendrograms and projections to maps, and using multi-dimensional scaling (MDS) to examine how well the groups characterize the data set.  Tufte suggests that visualizations ought to summarize analyses “at a glance” but also to invite close scrutiny. With respect to the latter, i.e., we show how MDS visualizations inform the grouping presented in dendrograms.

Chambers & Trudgill aimed in their book, Dialectology, (1998: 187) to fuse traditional dialectological research with modern sociolinguistics, and their book is a major step in that research program. But an analytical gap has remained. While sociolinguistics has mostly employed factorial designs to assess the importance of one or another social factor in the distribution of individual linguistic features, dialectology has emphasized the geographic analysis of aggregate linguistic differences, typically using regression designs with geography operationalized as distance. Wieling, Nerbonne & Baayen have shown in a 2011 PLoS ONE article how to use generalized additive modeling to simultaneously account for geographical, social and linguistic influences in single regression models.  These require their own visualizations in the form of iso-lines representing geographic influence, and selected pairs of maps showing the varied distributions based on categorical distinctions.

Chambers JK, P Trudgill (1998) Dialectology. Cambridge, UK: CUP. 2nd ed.

Nerbonne J (2009) Data-Driven Dialectology. Language and Linguistics Compass 3(1): 175-198. 

Wieling M, J Nerbonne & RH Baayen. (2011) Quantitative Social Dialectology: Explaining Linguistic Variation Geographically and Socially. PLoS ONE, 6(9): e23613.