Semantic Data Model

Printer-friendly version

Jefferson Project needs for Contextual Knowledge


It is hard to reuse contextual knowledge that is encoded in formats and vocabularies that were not designed to support interoperability. It is particularly challenging to regenerate contextual knowledge that was not preserved at data collection time.

Most major scientific projects have some degree of commitment to the development and use of ontologies as a way of preserving knowledge, and in particular, as a way of encoding and preserving contextual knowledge. However, we often observe the following regarding the use of ontologies in major scientific projects:

  • Ontologies are often introduced in later stages of projects to overcome data integration and data quality issues
  • Later introduction of ontologies is often responsible for knowledge about data to be significantly incomplete

In the Jefferson Project, we see ontologies being adopted and used earlier than in other comparable scientific projects.

  • The early adoption of ontologies will enable Lake George to have one of the most comprehensive body of knowledge about any large scale observational and simulated data collection

In the Jefferson Project, we aim to use standards to preserve contextual knowledge for any data collected by JP’s sensors and generated by JP’s models

Semantic Data Model


The reference model is composed of incremental layers of data/metadata enhancement. Each layer is designed to support a higher level of data quality than the previous layer.

SemanticDataModel.png

  • Semantic Layers (SL2, SL3 and SL4)
    • The semantic layers partially rely on semantic web technology to encode data
    • RDF (format and encoding)
    • Formalized vocabularies, viz., with the use of OWL, PROV
  • Original Data (L1)
    • Preserves the original format, language and encoding of data

Semantic Level 2 (SL2) Data Enhancements


By using Tetherless’ Prizms, JP’s data may be enhanced by the following:
  • Data Sharing over the Web
    • Linked data
  • Systematic support for data curation (e.g., standard mechanisms for annotation)
    • Data “calibration” & “registration”
  • Data revision (i.e., evolution)
    • Version Control
    • Provenance Capture
      • Agent's assertions, derivation tree

Semantic Level 3 (SL3) Data Enhancements


  • Data monitoring
    • Machine-verifiable data quality
      • Rule-based specification of data quality
  • Data Integration
    • Ontological annotation of scientific vocabulary
    • Alignment of scientific vocabulary

Semantic Level 4 (SL4) Data Enhancements


  • Data Question-Answering
    • Question understanding (through Watson framework)
    • Support for different question modalities:
      • Questions requiring factoid knowledge
      • Question requiring derived knowledge including responses from simulation models
    • Support for combined question modalities
  • Data Visualization