Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines

Printer-friendly version

This work was conducted as part of greater work done in the Semantic Provenance Capture in Data Ingest Systems (SPCDIS) project. More details about SPCDIS can be found here.

Concepts:Data Science


One goal in studying phenomena of the solar corona (e.g., flares, coronal mass ejections) is to create and refine predictive models of space weather – which have broad implications for terrestrial activity (e.g., communication grid reliability). The High Altitude Observatory (HAO) [1] presently maintains an infrastructure for generating time-series visualizations of the solar corona. Through raw data gathered at the Mauna Loa Solar Observatory (MLSO) in Hawaii, HAO performs follow-up processing and quality control steps to derive visualization sets consumable by scientists.

Individual visualizations will acquire several properties during their derivation, including: (i) the source instrument at MLSO used to obtain the raw data, (ii) the time the data was gathered, (iii) processing steps applied by HAO to generate the visualization, and (iv) quality metrics applied over both the raw and processed data. In parallel to MLSO’s standard data gathering, time stamped observation logs are maintained by MLSO staff, which covers content of potential relevance to data gathered (such as local weather and instrument conditions).

In this setting, while a significant amount of solar data is gathered, only small sections will typically be of interest to consuming parties. Additionally, direct presentation of solar data collections could overwhelm consumers (particularly those with limited background in the data structuring).

This work explores how multidimensional analysis based navigation can be used to generate summary views of data collections, based on two operations: (i) grouping visualization entries based on similarity metrics (e.g., data gathered between 23:15-23:30 6-21-2012), or (ii) filtering entries (e.g., data with a quality score of UGLY, on a scale of GOOD, BAD, or UGLY).

Here, semantic encodings of solar visualization collections (based on the Resource Description Framework (RDF) Datacube vocabulary [2]) are being utilized, based on the flexibility of the RDF model for supporting the following use cases:

(i) Temporal alignment of time-stamped MLSO observations with raw data gathered at MLSO. (ii) Linking of multiple visualization entries to common (and structurally complex) workflow structures – designed to capture the visualization generation process.

To provide real-world use cases for the described approach, a semantic summarization system is being developed for data gathered from HAO’s Coronal Multi-channel Polarimeter (CoMP) and Chromospheric Helium-I Imaging Photometer (CHIP) pipelines.


DateCreated ByLink
November 30, 2012
November 30, 2012
November 30, 2012

Related Projects:

SPCDIS Project LogoSemantic Provenance Capture in Data Ingest Systems (SPCDIS)
Principal Investigator: Peter Fox
Co Investigator: Deborah L. McGuinness
Description: The goal of this project is to develop at the RPI Tetherless World Constellation, based within the NCAR High Altitude Observatory and in collaboration with the University of Texas at El Paso, the University of Michigan and McGuinness Associates a semantically-enabled data ingest capability.

Related Research Areas:

Data Frameworks
Lead Professor: Peter Fox
Description: None.
Knowledge Provenance
Lead Professor: Deborah L. McGuinness
Description: Knowledge Provenance
Concepts: ,