Mapping Geographical Variation in Social Media Writing

Sunday, 15 February 2015: 1:00 PM-2:30 PM
Room LL21E (San Jose Convention Center)
Jacob Eisenstein, Georgia Institute of Technology, Atlanta, GA
Dialectology has mainly focused on variation in spoken language. However, social media writing provides a new channel for linguistic creativity, and demonstrates dramatic geographical differences. The abundance of social media data enables the application of machine learning techniques capable of discovering and visualizing dialect regions and their linguistic signatures. I will describe my ongoing research on measuring dialect variation in the United States, focusing on the use of neologisms in Twitter. Due to the fluidity of language in online media, it is possible to observe and measure language change in real time. By tracking the spread of words across regions and groups, it is possible to reconstruct the pathways of sociolinguistic influence, through which new linguistic features propagate from influential regions to their satellites. This network is strongly linked to both geography and demographics, showing that American cities with similar racial compositions are most likely to share linguistic features. Rather than moving towards a single unified "netspeak" dialect, language change in online social media reproduces existing fault lines in spoken American English.