Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines

One goal in studying phenomena of the solar corona (e.g., flares, coronal mass ejections) is to create and refine predictive models of space weather – which have broad implications for terrestrial activity (e.g., communication grid reliability). The High Altitude Observatory (HAO) [1] presently maintains an infrastructure for generating time-series visualizations of the solar corona. Through raw data gathered at the Mauna Loa Solar Observatory (MLSO) in Hawaii, HAO performs follow-up processing and quality control steps to derive visualization sets consumable by scientists.

Individual visualizations will acquire several properties during their derivation, including: (i) the source instrument at MLSO used to obtain the raw data, (ii) the time the data was gathered, (iii) processing steps applied by HAO to generate the visualization, and (iv) quality metrics applied over both the raw and processed data. In parallel to MLSO’s standard data gathering, time stamped observation logs are maintained by MLSO staff, which covers content of potential relevance to data gathered (such as local weather and instrument conditions).

In this setting, while a significant amount of solar data is gathered, only small sections will typically be of interest to consuming parties. Additionally, direct presentation of solar data collections could overwhelm consumers (particularly those with limited background in the data structuring).

This work explores how multidimensional analysis based navigation can be used to generate summary views of data collections, based on two operations: (i) grouping visualization entries based on similarity metrics (e.g., data gathered between 23:15-23:30 6-21-2012), or (ii) filtering entries (e.g., data with a quality score of UGLY, on a scale of GOOD, BAD, or UGLY).

Here, semantic encodings of solar visualization collections (based on the Resource Description Framework (RDF) Datacube vocabulary [2]) are being utilized, based on the flexibility of the RDF model for supporting the following use cases:

(i) Temporal alignment of time-stamped MLSO observations with raw data gathered at MLSO. (ii) Linking of multiple visualization entries to common (and structurally complex) workflow structures – designed to capture the visualization generation process.

To provide real-world use cases for the described approach, a semantic summarization system is being developed for data gathered from HAO’s Coronal Multi-channel Polarimeter (CoMP) and Chromospheric Helium-I Imaging Photometer (CHIP) pipelines.

View Publication

Associated Projects

The goal of this project is to develop at the RPI Tetherless World Constellation, based within the NCAR High Altitude Observatory and in collaboration with the University of Texas at El Paso, the University of Michigan and McGuinness Associates a semantically-enabled data ingest capability. The project is entitled: Semantic Provenance Capture in Data Ingest Systems (SPCDIS). Initially, we will limit our focus to a set of solar coronal physics instruments, but over time, we will target the broader area of solar and solar-terrestrial physics.

Citation