VSTO Key Concepts

Printer-friendly version

Science challenge - Data, Tools, Education, Outreach and more

In the disciplines of solar and solar-terrestrial science, space weather, and space physics in general, the importance of interrelated datasets is growing at a rate that cannot be accommodated by simple data systems let alone conventional web page listings of holdings. This is as true for HAO as it is for agency and community projects. HAO and its collaborators routinely run instruments taking resolved images of the Sun in multiple wavelengths with rapid time sampling. Increasingly, models are being developed to compare with or combine with these observations, or to assist in their interpretation.

In upper atmospheric research, the circumstances are almost the same with a rich interplay between instruments, models, and interpretation. Space weather, which intersects these fields, requires that a researcher be able to easily access solar data from a magnetospheric model, for example.

SSTSP interdisciplinary studies emphasize the real time or near-real time requirement to get to the data and analyze and combine it in a way that enhances our understanding of the Sun, the upper atmosphere of the Earth (including but not limited to solar influences) and many aspects of the Sun-Earth environment.

The science that may result is being limited by computer (software) infrastructure and data availability and usability. Thus, datasets alone are not sufficient to build a virtual observatory. The VSTO must address the interface problem to bring data to the users' tools, and to the tools within the VSTO, effectively and scalably. Thus, a significant effort for VSTO will be the development and/or adaptation of schema that adequately describe the syntax (name of a variable, its type, dimensions, etc. or the procedure name and argument list, etc.) and semantics (what the variable physically is, its units, etc.) and/or pragmatics (what the procedure is used for/does and returns, etc.) of the datasets and tools.

This task depends on metadata (information about the data and tools) some of which exist. To bring SSTSP data and tools, as well as educational materials that result from them up to par, VSTO will need to address semantic metadata and schema and the vocabulary differences across disciplines. Experience and tools from the development of ontologies in engineering, manufacturing and online commerce will be utilized in this regard.

As a result we will be able to improve the tools and the environments available to researchers and educators to address the science challenges.

Computer science challenges - interfacing data and tools, semantics and ontologies, scaling and collaborating

The application of ontologies and knowledge bases have unexplored possibilities in the realm of complex, and interdisciplinary data models and interfaces, tool and algorithm classification and use.

The knowledge that exists in people's minds is one of the more difficult elements to capture and represent within cyberinfrastructure. This problem is compounded in interdisciplinary settings in a similar way that it is harder to find scientists who can communicate across discipline boundaries.

One focus of VSTO is about elevating a researcher's or educator's ability to rapidly expand their interdisciplinary knowledge or cross a (or many) discipline boundaries more easily. Knowledge can be built by having excellent tools that enable rapid learning and discovery.

We use the term "ontology" in the way generally accepted by the computer science community, which is: 'a specification of a conceptualization'. Recent advances in information technology research have built methodologies for constructing and representing explicit knowledge in a machine-accessible manner. Ontologies are all about describing entities and the relationships among them. The DARPA Agent Markup Language (DAML) built upon the Resource Description Framework (RDF) and XML, is a good example of the expanding need and importance of capturing a formal representation of terms for manipulation and use on the web. We will utilize the most recent incarnation of such languages: OWL (the Web Ontology Language).

Ontologies are organized collections of human knowledge. Large-scale realizations are becoming an essential component of many applications including standard search (e.g. Google), e-commerce (e.g. Amazon), configuration (e.g. Dell), and government intelligence (such as DARPA's High Performance Knowledge Base (HPKB), Rapid Knowledge Formation, (RKF), and DAML programs). While common in eCommerce, various expert and recommender systems, and even in Geographic Information Systems (GIS), there has been relatively little use or application to the challenging area of SSTSP. The VSTO set of ontologies will be designed to meet the representational and reasoning needs of the project. However we will attempt to exploit schemas (e.g. Space Physics Data Markup Language: SPDML, Earth System Markup Language: ESML) and ontologies (e.g. developments for GEON in geosciences) that already exist or are being developed within SSTSP. Thus, it is expected that the developed ontology will be a combined effort of building VSTO-specific information along with integrating and merging existing ontological information.

Technology capability - data assimilation

Another broad strategic focus for the community and NCAR is data assimilation (DA).

Data assimilation is an important research tool in a number of science applications, such as Numerical Weather Prediction (NWP). In simple terms it means that, in NWP for example, a model itself is insufficient as is the data, but that a synthesis of the two are required to explain a phenomenon, or predict an event. SSTSP has increasing needs to embrace DA as the importance of data assimilation is quickly spreading to other NSF/GEO/ATM areas as indicated at the Fall 2002 AGU meeting where special sessions were held. In the NSF/GEO 1999-2003 Facilities Plan, section 4.0 Potential New Capabilities strongly emphasizes DA.

DA within models (which can be analysis models) is task driven. These tasks are assembled in a workflow for incorporating the data, and most importantly require precise meanings for the data, i.e. semantics. When implemented in a modeling code (for example), the meaning is specified within the programming code and program logic/order. To make DA available via a web service, or within a distributed analysis environment such as the VSTO requires a clear specification of the interfaces (SOAP, WSDL, UDDI etc.) to the data and to the models/tools, with semantic meaning.

We intend to develop collaborations with existing DA efforts within NCAR (e.g. the Data Assimilation Research Testbed) and the community.

Such a capability has been demonstrated in a limited scope prototype as part of the Virtual Solar Observatory project by COSEC using knowledge representation techniques, ontologies and more importantly, practical tools such as markup languages to describe the meaning (and quality) of data collections (such as OWL and DAML), and ontology development and maintenance systems (such as Chimera).

Technology capability - ontology development and evolution

Knowledge about application tools: how they read data, how they store, and what the output data represent, are specific problems to address, especially when accessed through distributed computing resources. Problems quickly arise when trying to understand interactions among complex or interdisciplinary systems; something that exists in SSTSP for both education and scientific research processes. Too often, the specific storage formats and protocols become an additional barrier in opening the information up to another area. Even today, the products educators and scientists seek are based in complex structures and/or contain or rely upon large volumes of reference materials, (often-undocumented) computer code, visual representations, and/or numeric data. Existing standards such as Open Knowledge Base Connectivity, and the Knowledge Interchange Format and existing markup standards such as OWL and RDF-DAML will be leveraged as a basis to record content such as application interfaces and data outputs.

Thus, a key technology development will be a set of linked, i.e. interdisciplinary-enabled, ontologies arising out of VSTO-specific information along with integrating and merging existing semantic metadata.

This process requires tools that support broad ranges of users in

  1. merging of ontological terms from varied sources
  2. diagnosis of coverage and correctness of ontologies
  3. maintaining ontologies over time