Friday, February 15, 2013
Room 304 (Hynes Convention Center)
Over the past decade, we and others have created diachronic corpora of English and other languages that contain millions of words of running text, each sentence annotated for its complete syntactic structure. This database construction has been carried out with a combination of automated analysis and human correction of the errors generated by automated routines. Over time, the efficiency of our procedures has risen steadily and it is now possible to build corpora of useful size at a reasonable rate and cost. Using these corpora, specialists in historical syntax have been able to track the diffusion of grammatical changes across dialects and through time, making discoveries of interest both to language history and to cognitive science. One of the results of the research has been the discovery that syntactic diffusion is tightly constrained by the grammatical character of the spreading change. In particular, a change will spread at the same rate in all linguistic contexts that it affects, the so-called "Constant Rate Effect." Another result, less firmly established, is that syntactic choices that are independent in grammar seem, from the frequency patterns found in texts, to be made by speaker/writers in a statistically independent way in their use of the language. Our results have begun to let us reverse the usual direction of explanation in linguistics and to discover grammatical properties from historical documents by analyzing frequency distributions and correlations among frequencies in our textual databases.