5845 Security Informatics: The Dark Web Experience

Saturday, February 18, 2012: 1:30 PM
Ballroom A (VCC West Building)
Hsinchun Chen , University of Arizona, Tucson, AZ
In 2002, partly in response to the events of 9/11 and partly as a natural expansion of its previous work in border security and information sharing and data mining for law enforcement, the Artificial Intelligence Lab of the University of Arizona initiated its “Dark Web Project.”  Dark Web is a long-term scientific research program that aims to study international terrorism via a computational, data-centric approach. Our goal is to collect, as comprehensively as possible, the web content generated by international extremist and terrorist groups, including web sites, forums, social networking sites, videos, virtual world, etc.

We have developed new collection methods, as well as innovative multilingual data mining, text mining, and web mining techniques to perform link analysis, content analysis, web metrics analysis, sentiment analysis, authorship analysis, and video analysis in our research. Our collection is close to 10 TB in size; we believe it is the largest such collection of open source material in the world. The approaches and methods developed help to advance the field of intelligence and security informatics (ISI). ISI is the “development of advanced information technologies, systems, algorithms, and databases for national security related applications, through an integrated technological, organizational, and policy-based approach” (Chen et al., 2003).  Our work, a unique blend of basic and applied research, contributes new approaches, methods, and computational techniques for data analysis and visualization that are beneficial across a wide range of domains.

Recent work has included the development of an SIR (susceptible-infected-recovered) model for tracking the development of  “infectious ideas” through Dark Web forums.  Many of the forums we have collected are available to researchers through the Dark Web Forum Portal, a unique resource that supports searching, browsing, and downloading of threads and messages in the collection.  The Portal currently contains over 15 million messages from 28 forums. 

 New collection methods allow us to identify and acquire “dark” videos from social media outlets, and these will be added to a future release of the portal.  Another extension will include a sentiment analyzer that will allow researchers to analyze, and visualize the sentiment and affect (emotion) inherent in forum messages.  The sentiment analyzer incorporates and builds on multilingual, feature-based text-mining and visualization research performed in the Lab.

 Finally, as the world is evolving, so is the focus of our work.  We have recently expanded our collection efforts to include social media from “at risk” countries.  Our new Geopolitical Web research will build on many of the techniques we have developed as result of our work with Dark Web data, but will be further expanded to include time series analysis, and cultural/economic/political metrics to help us assess societal risk.

Previous Presentation | Next Presentation >>