Industry Partnerships in Data Science To Advance Scientific Research
Education
In Spring 2013 and Summer 2014, we ran a popular Coursera MOOC "Introduction to Data Science" that attracted over 180,000 students across both offerings. This course combined material from data management, scalable computing, visualization, statistics, and machine learning, using a multifaceted pedagogy that included automatically graded programming assignments, peer assessments, and external course projects in addition to video lectures. I’ll describe some lessons learned and the opportunity for online courses to engage professionals in the conduct of science.
Research infrastructure
In partnership with Microsoft Research, Amazon Web Services, and Google, we are developing new infrastructure for data-intensive science designed to democratize the use of algorithms, techniques, and technologies developed by and for IT professionals. With the SQLShare project, we aimed to lower the activation energy rewquired to use database technology in science contexts, providing a web-based interface for sharing data and deriving new results. In the Myria project, we significantly increase the scale of data supported and directly support advanced analytics.
The eScience Incubator
To reach the broadest possible audience, we have established data science incubation program to jumpstart new projects in data-intensive science. Researchers submit a 1-2 page proposal for a 3-month project in which they will engage directly with our permanent staff. Each proposal will indicate one or more students, postdocs, staff scientists, or faculty who will physically work in our shared studio space two days per week for the duration of the project. This stipulation is critical: it ensures full engagement on the part of the researchers, promotes active use of the studio space, creates opportunities for cross-pollination between multiple concurrent projects, and facilitates knowledge transfer in software engineering, reproducibility, and data science methods. Through these projects, seed grants will accelerate the development and dissemination of relevant data science tools and techniques, evaluate their use in new contexts, and triage new potential collaborations with methodology researchers.