Industry Partnerships in Data Science To Advance Scientific Research

Sunday, 15 February 2015: 1:30 PM-4:30 PM
Room LL21F (San Jose Convention Center)
William Howe, University of Washington, Seattle, WA
At the University of Washington eScience Institute, we are broadly engaged in advancing the research and pratice of data-intensive science.  I will describe some of our programs designed to help us engage with industry partners, along with the initial findings.  In education, we completed a massively open online course, and are developing a new master’s program.  In research infrastructure, we are partnering with cloud providers to deliver new online data science services with broad use cases.  Organizationally, we are piloting a new “incubator” program to help scientists develop tools and skills in data science. In all areas related to data science, the interface between industry and research are blurring and we see this as an opportunity to catalyze both. 

Education

In Spring 2013 and Summer 2014, we ran a popular Coursera MOOC "Introduction to Data Science" that attracted over 180,000 students across both offerings.  This course combined material from data management, scalable computing, visualization, statistics, and machine learning, using a multifaceted pedagogy that included automatically graded programming assignments, peer assessments, and external course projects in addition to video lectures.  I’ll describe some lessons learned and the opportunity for online courses to engage professionals in the conduct of science.

Research infrastructure

In partnership with Microsoft Research, Amazon Web Services, and Google, we are developing new infrastructure for data-intensive science designed to democratize the use of algorithms, techniques, and technologies developed by and for IT professionals.  With the SQLShare project, we aimed to lower the activation energy rewquired to use database technology in science contexts, providing a web-based interface for sharing data and deriving new results.  In the Myria project, we significantly increase the scale of data supported and directly support advanced analytics.

The eScience Incubator

To reach the broadest possible audience, we have established data science incubation program to jumpstart new projects in data-intensive science. Researchers submit a 1-2 page proposal for a 3-month project in which they will engage directly with our permanent staff. Each proposal will indicate one or more students, postdocs, staff scientists, or faculty who will physically work in our shared studio space two days per week for the duration of the project. This stipulation is critical: it ensures full engagement on the part of the researchers, promotes active use of the studio space, creates opportunities for cross-pollination between multiple concurrent projects, and facilitates knowledge transfer in software engineering, reproducibility, and data science methods. Through these projects, seed grants will accelerate the development and dissemination of relevant data science tools and techniques, evaluate their use in new contexts, and triage new potential collaborations with methodology researchers.