DataONE Semantics

Printer-friendly version

Research Areas: Semantic eScience, Knowledge Provenance, Ontology Engineering Environments
Principal Investigator: Deborah L. McGuinness
Co Investigator: Matt Jones, Ben Leinfelder, Xixi Luo, and Mark Schildhauer
Concepts: Ontology, Big Data, Semantic Web, Natural Language Processing, Data Curation, Earth Science

Searching in DataONE currently focuses on fielded and full-text metadata. It does not allow precise queries of measurement types primarily because the metadata corpus contains uncontrolled descriptions of measurements (e.g., variable names, descriptions, units, etc.), making it impossible to find all datasets that use a particular measurement type. DataONE will extend its search system using scalable semantic annotation12 and inferencing. To enhance the interoperability of DataONE annotated objects, this extension will link community-based standard vocabularies (ontologies)13 during data ingestion and processing using inference based on concept hierarchies and property relationships. Key activities include: (a) Defining the scope of prototypes and the production framework; (b) Selecting contributing ontologies and terminologies (e.g., CF,14 CUAHSI,15 ENVO,16 SWEET17) and extending these as needed to provide coverage of measurement types; (c) Defining the annotation representation framework, e.g., by extending PROV-O18 or AO19 to enable measurement type association with attributes of entities within a data package; (d) Implementing annotation storage framework and user interfaces to facilitate ease of manual annotation by DataONE users through investigator tools; (e) Developing data mining algorithms to infer from data and metadata the measurement types aligned with the ontology, and incorporating automated annotation capabilities into Investigator Tools; and (f) Developing extensions to the DataONE content indexing, programmatic query services, and user interfaces to enable discovery through controlled definitions of measurement types.