Data Science

Data science is advancing inductive conduct of science driven by the greater volumes, complexity and heterogeneity of data being made available over the Internet. Data science combines of aspects of data management, library science, computer science, and physical science using supporting cyberinfrastructure and information technology. As such it is changing the way all of these disciplines do both their individual and collaborative work.

Data science is helping scienists face new global problems of a magnitude, complexity and interdisciplinary nature whose progress is presently limited by lack of available tools and a fully trained and agile workforce.


We present our work on semantically-enabled data and schema registration in the setting of a scientific data integration project: SESDI (Semantically-Enabled Scientific Data Integration), which aims initially to integrate heterogeneous volcanic and atmospheric chemical compound data in support of

The Open-source Project for a Network Data Access Protocol (OPeNDAP) software framework has evolved over the last 10 years to become a robust, high performance, service oriented architecture for the access and transport of scientific data from a broad variety of disciplines, over the Internet.

We have developed a semantic data framework that supports interdisciplinary virtual observatory projects across the fields of solar physics, space physics and solar-terrestrial physics.

In this paper, we describe how a semantic web based provenance Interlingua called the Proof Markup Language (PML) has been used to encode workflow provenance in a variety of diverse application areas.

Semantic interoperability of mineral exploration geodata is a long-term concern in mining projects. Inconsistent conceptual schemas and heterogeneous professional terms among various geodata sources in a mining project often hinder their efficient use and/or reuse.

Web-based science analysis and processing tools allow users to access, analyze, and generate visualizations of data without requiring the user be an expert in data processing. These tools simplify science analysis for all science users by reducing the data processing overhead for the user.

The Virtual Solar-Terrestrial Observatory (VSTO) Portal at vsto.org provides a set of guided workflows to implement use cases designed for the VSTO Project.

Oceanographic research covers a broad range of science domains and requires a tremendous amount of cross-disciplinary collaboration. Advances in cyberinfrastructure are making it easier to share data across disciplines through the use of web services and community vocabularies.

Data.gov is a website that provides US Government data to the general public to ensure better accountability and transparency.