The Anatomy and Physiology of Data Science

Printer-friendly version

Presented at the AGU Fall Meeting 2014

Authors:Peter Fox

Abstract:

Whether the science (especially geosciences) community at-large likes it or not, the co-opting of the term Data Science by the private sector has led to increased hype over data science as a career and as a means to solve challenging data problems, and lack of educational innovation in curricula for data science. If the full benefits of a new generation of statistical and analytical software tools that operate on high-performance computational infrastructure are to be attained, adequate attention to the 'science of data science' is needed. In this contribution, we present a science view of data science both from an education and research perspective. We also will introduce a research agenda that explores the key challenges that must be met to meet the needs of research driven by large-scale data analytics. We focus on three, as-yet untapped, data science topics: understanding scale in systems, spare systems, and abductive reasoning. We conclude with a specific call to action to make progress on the aforementioned topics.

History

DateCreated ByLink
December 14, 2014
21:40:02
Peter FoxDownload

Related Research Areas:

Data Science
Lead Professor: Peter Fox
Description: Science has fully entered a new mode of operation. Data science is advancing inductive conduct of science driven by the greater volumes, complexity and heterogeneity of data being made available over the Internet. Data science combines of aspects of data management, library science, computer science, and physical science using supporting cyberinfrastructure and information technology. As such it is changing the way all of these disciplines do both their individual and collaborative work.

Data science is helping scienists face new global problems of a magnitude, complexity and interdisciplinary nature whose progress is presently limited by lack of available tools and a fully trained and agile workforce.

At present, there is a lack formal training in the key cognitive and skill areas that would enable graduates to become key participants in escience collaborations. The need is to teach key methodologies in application areas based on real research experience and build a skill-set.

At the heart of this new way of doing science, especially experimental and observational science but also increasingly computational science, is the generation of data.

Concepts: eScience