Latent Dirichlet Allocation Modeling of Conference Abstracts to Identify Emerging Research

Sunday, February 14, 2016
Emma Tran, Science and Technology Policy Institute, Washington, DC
Identifying emerging research areas is of interest to Federal agencies that wish to identify gaps in their research portfolio and align their funding with scientific community trends. Current manual identification efforts used by agencies can be costly and labor-intensive. They can also be difficult to verify and reproduce with rigor. Fully automated horizon-scanning systems tend to use text-mining methods and data sources such as publications and patents that may be well behind the cutting edge of research. This project seeks to develop a more accurate identification system by applying natural language processing models applied to conference abstracts—a data source that we believe better reflects the work at the frontiers of research. We seek to validate this method using real-world data. Applying a latent Dirichlet allocation method to identify “topics” within the abstracts of past Society for Neuroscience (SfN) conferences and recently funded National Institutes of Health (NIH) neuroscience grants, we identify topics that first emerge within SfN abstracts and subsequently trend upward in NIH funding. The efficacy of our system demonstrates that conference abstracts may be a viable upstream indicator of the emergence of novel research areas.