5848 Text Mining in Action: Early Alerting of Disease Outbreaks from Online Media

Saturday, February 18, 2012: 1:30 PM
Ballroom A (VCC West Building)
Nigel Collier , National Institute of Informatics , Tokyo, Japan
Accurate and timely detection of infectious disease outbreaks is necessary to help support risk assessment and ultimately to save lives and livelihoods. In recent years information technology and in particular the Internet have revolutionised disease surveillance.  As the volume of multi-media information increases so too does the potential to capture reports of disease outbreaks so that they can be acted upon close to source by public health agencies.

The BioCaster project was begun in 1996 at the National Institute of Informatics in Tokyo and provides fully automated 24/7 online public notification about events of interest in the world’s electronic media. Currently BioCaster surveillances over 300 health conditions in 12 languages across tens of thousands of news sources and is in regular use by international and national public health agencies as well as international travellers and the public at large. From a research perspective our goal has been to conduct systematic evaluations that have enabled us to select the best suite of algorithms and knowledge resources for the task. This has enabled us to make freely available the first multi-lingual public health ontology, a flexible rule engine as well as a collection of texts for open benchmarking. These resources can be readily incorporated into applications which meet public health needs.

The DIZIE project was launched in 2010 aiming to harness individual health reports for syndromic awareness in the vastly expanding social media space such as Twitter. Both BioCaster and DIZIE exploit state of the art text mining technology for high throughput fact extraction to detect statistical event anomalies in near real time.

Moving forward we are now working on several challenge areas: (1) integrating evidence across documents and media types, (2) providing realistic benchmarks for evaluation, and (3) fine grained geo-coding of place names.