SPCDIS Project Detail Description

From Semantic Portal Wiki

Jump to: navigation, search

SPCDIS
Project Information

Announcements
Calendar
Meeting Notes
Working Groups
Project Partners

Demos

Search
Visualization

Research

Key Concepts
Publications
Presentations
Related Research

Design

Use Cases
Requirements
Broad Design
Architecture

Development

Developer Guide

Software & Services

Releases
Documentation

Project Description

The goal of this project is to develop at the RPI Tetherless World Constellation, based within the NCAR High Altitude Observatory and in collaboration with the University of Texas at El Paso, the University of Michigan and McGuinness Associates a semantically-enabled data ingest capability.


This project addresses key issues in the areas of data provenance, trust, and custodianship. A framework will be provided to enhance data ingest systems and tools will be designed to make determinations of trust and reliability of data based on provenance information captured by the enhanced ingest. This project will directly address key needs in Cyberinfrastructure (CI) such as software tools and services, data ingest, provenance representation, proof representation, metadata, documentation, and quality control.


The term Cyberinfrastructure has been given to the set of reliable, well-specified and interoperable connections of electronic hardware and software that allow people to discover, learn, teach, collaborate, disseminate, access and preserve knowledge in their domain. Provenance systems utilize Cyberinfrastructure software to preserve metadata about datasets and to discover properties of a dataset not contained in the data proper by analyzing its provenance metadata, or origin.


This project will contribute to the development and support of community resources, data systems and provision of results from ground-based observing systems using semantically-enabled provenance and derivation data technologies. In addition, this project will support the community by enhancing access to near real-time data streams as well as increasing the reliability and robustness enhanced with richer metadata and annotation capabilities associated with the primary science data streams that will make the data easier to find and use.

What problem does SPCDIS address?

In analyzing the needs of instrument data providers, several challenges are clear:

  • "Data is coming in faster, in greater volumes and outstripping our ability to perform adequate quality control".
  • "Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision".
  • "We often fail to capture, represent and propagate manually generated information that need to go with the data flows".
  • "Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects".
  • "The task of event determination and feature classification is onerous and we don't do it until after we get the data".


Our conclusion is that these statements point to the lack of a comprehensive, re-useable data ingest framework. Further, to be truly effective in an age of modern data management and dissemination we propose that such a framework must consist of a semantically rich set of annotations along the data ingest workflow and a smart storage, propagation and retrieval mechanism for the provenance and derivation information. This goal for effectively increasing the data product value includes the need to annotate and map at the source, and even prior to generation of the data, i.e. in the instrument development and observational planning stages.


Since we propose to utilize both an existing and mature data input stream from ACOS (Advanced Coronal Observing System currently installed at the Mauna Loa Solar Observatory) as well as a planned (very large volume) data stream from CoSMO (Coronal Solar Magnetism Observatory) we base our work in a very practical application of semantic web methodology and technology.


The specific tasks proposed are divided into two broad areas:

  • analysis of the broad data ingest requirements and workflow pipeline based on the end-use cases and development of the required metadata and semantics within existing and new cyberinfrastructure, and
  • deployment demonstration of this infrastructure in the current science data ingest environment of HAO's ACOS and its generalization and application to CoSMO, as that project is developed.
Personal tools
Semantic Web Community
Tetherless World constellation
maintenance