SPCDIS Project Key Concepts

Printer-friendly version

Science challenge - data ingest

In the disciplines of solar and solar-terrestrial science, space weather, and space physics in general, the importance of multi-channel observing instruments and instrument suites and thus the rate of rich data streams is growing at a rate that makes the effective use of those data limited to a small, often expert, community.

HAO and its collaborators routinely run instruments taking resolved images of the Sun in multiple wavelengths with rapid time sampling. Increasingly, models (analysis and simulation) and interrelated data streams are being utilized to compare with or make effective use of these observations. Many interdisciplinary studies emphasize the real time or near-real time requirement to get to the data and analyze and combine it in a way that enhances our understanding of the Sun, the upper atmosphere of the Earth (including but not limited to solar influences) and many aspects of the Sun-Earth environment.

The solar corona is the origin of much of the activity producing strong perturbations on the Earth's environment, including the variable EUV radiation and solar wind, flares and coronal mass ejections (CMEs). Of many unanswered questions concerning the corona, one is currently of preeminent importance. Can we understand the nature of this activity to the point where useful predictions of space weather can be made? This is the question central to NSF's National Space Weather Program (NSWP) and NASA's Living with a Star (LWS) initiative.

Driven ultimately by society's need to understand the origin(s) of space weather at unprece- dented spatial, temporal and spectral resolution, as well as with the need for new measurements of the Sun's coronal magnetic field, NCAR scientists at the High Altitude Observatory, along with colleagues at the University of Hawaii and the University of Michigan, plan to build the Coronal Solar Magnetism Observatory (CoSMO) at a site to be determined in Hawaii.

CoSMO's specific science questions include: How do the photospheric and coronal fields relate in space and time? What are the coronal sources of coronal mass ejections (CMEs)? How is the dynamo reflected in the corona and the heliosphere? How are prominences formed and how do they relate to CMEs? How and where are particles accelerated to high energy? What is the role of magnetic reconnection for flares and CMEs?

Ultimately instrument suites such as ACOS and CoSMO produce increasingly large amounts of data. At present, the effective and efficient processing and use of input data streams is being limited by computer (software) infrastructure which is still largely syntactic and either not reusable, or hard to re-use, thus limiting usability unless there is a human-in-the-loop (HITL).

Computer science challenges - interfacing to data pipelines and workflows, semantics, rules, explanation, proof, and trust

The goal of our work on knowledge provenance support for scientific data is to enable systems that are transparent and trustworthy. Our work on multi-disciplinary virtual observatories aims to use machine understandable encodings of term meanings to power data integration. However, if these systems are to be broadly used, they need to be able to explain what they are doing, have done, how they obtained their input data, and the conditions under which it was collected. Challenges for our work on scientific provenance include designing an appropriate infrastructure for representing, manipulating, and delivering provenance to users (humans and machine services).

The application of ontologies and knowledgebases have largely unexplored possibilities in the realm of complex, and interdisciplinary data models and interfaces, tool and algorithm classification and use. Our aim is to not only capture the relevant provenance at stages in the data pipeline but create the associations with the ontology as part of that process. This adds value in two ways: the associations do not need to be done after the provenance store is populated, i.e. via a harvest and perhaps manual analysis procedure, and we can take advantage of reasoning/inference during the data ingest procedures.

Technology capability - inference framework and provenance interlingua

Introduction to Provenance

We leverage innovative work with the Inference Web explanation framework and its Interlingua for provenance, justification, and trust representation. PML - the Proof Markup Language - is the name of the Interlingua. PML provides the representational language used for representation and Inference Web provides a set of tools for generating, validating, manipulating, summarizing, and presenting knowledge provenance. Our project will leverage these building blocks to provide provenance support for scientific data processing pipelines as well as data management and archive systems.

Technology capability - ontology development and evolution

Knowledge about data ingest application and tools, how they read data and how they store and what the output data represent are specific problems to be addressed when accessed through distributed computing resources. A key technology development will be enhancements to the set of linked, interdisciplinary- enabled, ontologies built for VSTO. This process requires tools that support broad ranges of users in (1) merging of ontological terms from varied sources, (2) diagnosis of coverage and correctness of ontologies, and (3) maintaining ontologies over time. The knowledge engineering expertise of Deborah McGuinness, will be utilized to help domain experts build and evolve existing ontologies, and to support the analysis, generation, comparison, explanation, and extension life cycle phases.