We have developed new collection methods, as well as innovative multilingual data mining, text mining, and web mining techniques to perform link analysis, content analysis, web metrics analysis, sentiment analysis, authorship analysis, and video analysis in our research. Our collection is close to 10 TB in size; we believe it is the largest such collection of open source material in the world. The approaches and methods developed help to advance the field of intelligence and security informatics (ISI). ISI is the “development of advanced information technologies, systems, algorithms, and databases for national security related applications, through an integrated technological, organizational, and policy-based approach” (Chen et al., 2003). Our work, a unique blend of basic and applied research, contributes new approaches, methods, and computational techniques for data analysis and visualization that are beneficial across a wide range of domains.
Recent work has included the development of an SIR (susceptible-infected-recovered) model for tracking the development of “infectious ideas” through Dark Web forums. Many of the forums we have collected are available to researchers through the Dark Web Forum Portal, a unique resource that supports searching, browsing, and downloading of threads and messages in the collection. The Portal currently contains over 15 million messages from 28 forums.
New collection methods allow us to identify and acquire “dark” videos from social media outlets, and these will be added to a future release of the portal. Another extension will include a sentiment analyzer that will allow researchers to analyze, and visualize the sentiment and affect (emotion) inherent in forum messages. The sentiment analyzer incorporates and builds on multilingual, feature-based text-mining and visualization research performed in the Lab.
Finally, as the world is evolving, so is the focus of our work. We have recently expanded our collection efforts to include social media from “at risk” countries. Our new Geopolitical Web research will build on many of the techniques we have developed as result of our work with Dark Web data, but will be further expanded to include time series analysis, and cultural/economic/political metrics to help us assess societal risk.
See more of: Information
See more of: Symposia