SPCDIS Broad Design

Printer-friendly version

Start at the source

In analyzing the needs of instrument data providers, several challenges are clear:

  • "Data is coming in faster, in greater volumes and outstripping our ability to perform adequate quality control."
  • "Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision."
  • "We often fail to capture, represent and propagate manually generated information that need to go with the data flows."
  • "Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects."
  • "The task of event determination and feature classification is onerous and we don't do it until after we get the data."

Our conclusion is that these statements point to the lack of a comprehensive, re-useable data ingest framework. Further, to be truly effective in an age of modern data management and dissemination we propose that such a framework must consist of a semantically rich set of annotations along the data ingest workflow and a smart storage, propagation and retrieval mechanism for the provenance and derivation information. This goal for effectively increasing the data product value includes the need to annotate and map at the source, and even prior to generation of the data, i.e. in the instrument development and observational planning stages.

Since we propose to utilize both an existing and mature data input stream from ACOS (Advanced Coronal Observing System currently installed at the Mauna Loa Solar Observatory) as well as a planned (very large volume) data stream from CoSMO (Coronal Solar Magnetism Observatory) we base our work in a very practical application of semantic web methodology and technology.

The specific tasks proposed are divided into two broad areas:

  • analysis of the broad data ingest requirements and workflow pipeline based on the end-use cases and development of the required metadata and semantics within existing and new cyberinfrastructure, and
  • deployment demonstration of this infrastructure in the current science data ingest environment of HAO's ACOS and its generalization and application to CoSMO, as that project is developed.

Provenance

Provenance is the origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility.

Basic functions

...

An overview of candidate technologies

...