Hearing the Unspoken: Using Text Mining to Investigate Social and Environmental Priorities

Grubert, Emily A.

Determining the social and environmental priorities of a community is highly relevant to making robust and resilient policy decisions. Traditionally, surveys are used as a tool of choice for assessing and ranking priorities and the reasons for those priorities in a given community. Notably, priorities are a function of time and place, and major weaknesses of survey research include the inability to go back in time to ask new or slightly different questions; the time and cost intensity of broad and representative sampling; and the need to restrict survey length in order to reduce the burden on respondents. This work presents preliminary application of text mining methods to answering questions about social and environmental priorities in English-speaking U.S. communities, with a particular focus on how energy affects society and the environment. Text mining, including such tools as topic models and sentiment analysis, is an increasingly sophisticated field applied in disciplines from literature to political science. The goal is to use existing texts, such as newspaper archives, blog posts, scientific literature, and even fiction, to make claims about a topic of interest. For example, text mining techniques can be used to evaluate and compare the style of works by various authors (like industry and its regulators), assess the frequency with which writings from a given period address a topic of interest, examine the change in prevalence of topics of discourse, and use sentiment analysis to investigate attitudinal shifts. This research has particular application to methods where large-scale survey data might be desirable but infeasible for reasons of accessibility, time, or cost. For example, a historical study investigating the attitudes of people from different regions toward a particular policy before and after it was enacted would not be able to launch an attitudinal survey, but text mining approaches with appropriate time bounds, location bounds, and source documents could potentially be used to approximate survey results. This research in particular is a first step toward applying large-scale text mining approaches to the derivation of societal preference-based weighting factors for life cycle assessment. Typically, weighting factors are based on the opinions of a few (e.g. the author or an expert panel) rather than the many. Ongoing work focuses on validating text mining-based results using various tools and corpora against more traditional survey and ethnographic approaches to deriving societal preferences in several U.S. regions.