Smart data infrastructure: The sixth generation of mediation for data science

Printer-friendly version

Presented at the AGU Fall Meeting 2014

Authors:Peter Fox


In the emergent “fourth paradigm” (data-driven) science, the scientific method is enhanced by the integration of significant data sources into the practice of scientific research. To address Big Science, there are challenges in understanding the role of data in enabling researchers to attack not just disciplinary issues, but also the system-level, large-scale, and transdisciplinary global scientific challenges facing society. Recognizing that the volume of data is only one of many dimensions to be considered, there is a clear need for improved data infrastructures to mediate data and information exchange, which we contend will need to be powered by semantic technologies. One clear need is to provide computational approaches for researchers to discover appropriate data resources, rapidly integrate data collections from heterogeneously resources or multiple data sets, and inter-compare results to allow generation and validation of hypotheses. Another trend is toward automated tools that allow researchers to better find and reuse data that they currently don’t know they need, let alone know how to find. Again semantic technologies will be required. Finally, to turn data analytics from "art to science", technical solutions are needed for cross-dataset validation, reproducibility studies on data-driven results, and the concomitant citation of data products allowing recognition for those who curate and share important data resources.


DateCreated ByLink
December 14, 2014
Peter FoxDownload

Related Research Areas:

Data Frameworks
Lead Professor: Peter Fox
Description: None.
Concepts: eScience
Data Science
Lead Professor: Peter Fox
Description: Science has fully entered a new mode of operation. Data science is advancing inductive conduct of science driven by the greater volumes, complexity and heterogeneity of data being made available over the Internet. Data science combines of aspects of data management, library science, computer science, and physical science using supporting cyberinfrastructure and information technology. As such it is changing the way all of these disciplines do both their individual and collaborative work.

Data science is helping scienists face new global problems of a magnitude, complexity and interdisciplinary nature whose progress is presently limited by lack of available tools and a fully trained and agile workforce.

At present, there is a lack formal training in the key cognitive and skill areas that would enable graduates to become key participants in escience collaborations. The need is to teach key methodologies in application areas based on real research experience and build a skill-set.

At the heart of this new way of doing science, especially experimental and observational science but also increasingly computational science, is the generation of data.

Concepts: eScience