The Changing Conduct of Geoscience in a Data Intensive World

Printer-friendly version

Presented at the CIRSS Seminar 10/05/2012

Authors:Peter Fox


Electronic facilitation of scientific research (often called eResearch or eScience) is increasingly prevalent in geosciences. Among the consequences of new and diversifying means of complex (*) data generation is that as many branches of science have become data-intensive (so-called fourth paradigm), they in turn broaden their long-tail distributions - smaller volume, but often complex data, will always lead to excellent science. There are many familar informatics functions that enable the conduct of science (by specialists or non-specialists) in this new regime. For example, the need for any user to be able to discover relations among and between the results of data analyses and informational queries. Unfortunately, true science exploration, for example visual discovery, over complex data remains more of an art form than an easily conducted practice. In general, the resource costs of creating useful visualizations has been increasing. Less than 10 years ago, it was assessed that data-centric science required a rough split between the time to generate, analyze, and publish data and the science based on that data. Today however, the visualization and analysis component has become a bottleneck, requiring considerably more of the overall effort and this trend will continue. Potentially even worse, is the choice to simplify analyses to 'get the work out'. Extra effort to make data understandable, something that should be routine, is now consuming considerable resources that could be used for many other purposes. It is now time to change that trend.

This presentation lays out informatics paths for truly 'exploratory' conduct of science cast in the present and rapidly changing reality of collaborative, Web/Internet-based data and software infrastructures. A logical consequence of these paths is that the people working in this new mode of research, i.e. data scientists, require additional and different education to become effective and routine users of new informatics capabilities. One goal is to achieve the same fluency that researchers may have in lab techniques, instrument utilization, model development and use, etc. Thus, in conclusion, curriculum and skill requirements for data scientists will be presented and discussed.


DateCreated ByLink

Related Research Areas:

Data Frameworks
Lead Professor: Peter Fox
Description: None.
Concepts: eScience
Data Science
Lead Professor: Peter Fox
Description: Science has fully entered a new mode of operation. Data science is advancing inductive conduct of science driven by the greater volumes, complexity and heterogeneity of data being made available over the Internet. Data science combines of aspects of data management, library science, computer science, and physical science using supporting cyberinfrastructure and information technology. As such it is changing the way all of these disciplines do both their individual and collaborative work.

Data science is helping scienists face new global problems of a magnitude, complexity and interdisciplinary nature whose progress is presently limited by lack of available tools and a fully trained and agile workforce.

At present, there is a lack formal training in the key cognitive and skill areas that would enable graduates to become key participants in escience collaborations. The need is to teach key methodologies in application areas based on real research experience and build a skill-set.

At the heart of this new way of doing science, especially experimental and observational science but also increasingly computational science, is the generation of data.

Concepts: eScience
Semantic eScience
Lead Professor: Peter Fox
Science has fully entered a new mode of operation. E-science, defined as a combination of science, informatics, computer science, cyberinfrastructure and information technology is changing the way all of these disciplines do both their individual and collaborative work.
As semantic technologies have been gaining momentum in various e-Science areas (for example, W3C's new interest group for semantic web health care and life science), it is important to offer semantic-based methodologies, tools, middleware to facilitate scientific knowledge modeling, logical-based hypothesis checking, semantic data integration and application composition, integrated knowledge discovery and data analyzing for different e-Science applications.
Partially influenced by the Artificial Intelligence community, the Semantic Web researchers have largely focused on formal aspects of semantic representation languages or general-purpose semantic application development, with inadequate consideration of requirements from specific science areas. On the other hand, general science researchers are growing ever more dependent on the web, but they have no coherent agenda for exploring the emerging trends on the semantic web technologies. It urgently requires the development of a multi-disciplinary field to foster the growth and development of e-Science applications based on the semantic technologies and related knowledge-based approaches.

Concepts: eScience
Lead Professor: Peter Fox
Description: In the last 2-3 years, Informatics has attained greater visibility across a broad range of disciplines, especially in light of great successes in bio- and biomedical-informatics and significant challenges in the explosion of data and information resources. Xinformatics is intended to provide both the common informatics knowledge as well as how it is implemented in specific disciplines, e.g. X=astro, geo, chem, etc. Informatics' theoretical basis arises from information science, cognitive science, social science, library science as well as computer science. As such, it aggregates these studies and adds both the practice of information processing, and the engineering of information systems.
Concepts: , eScience