Semantic Data Dictionary

SDD Dictionary Mapping
Data annotation, Knowledge extraction

The Semantic Data Dictionary (SDD) allows for the creation of semantic annotations for columns in a data set, categorical or coded cell values, and intrinsic concepts implicit in the data. Using a collection of tables or spreadsheets, mappings to terms in ontologies are encoded, resulting in an aggregation of knowledge that can be processed to form knowledge graph fragments. The SDD specification provides a suite of standards and tools that enables domain scientists to annotate semantic interpretations of datasets without requiring a deep understanding of semantics.

SDDs have been used in the RPI and IBM collaborative Health Empowerment by Analytics, Learning, and Semantics (HEALS) projects for the annotation of electronic health record (EHR) data. The NIH Children's Health Exposure Analysis Resource (CHEAR) and Human Health Exposure Analysis Resource (HHEAR) projects have leveraged SDDs to annotate epidemiological, demographic, and anthropometric data. In Brazil, researchers at Universidade de Fortaleza have used SDDs for the Big Data Ceara project and at Universidade Federal de Minas Gerais for the Global Burden of Disease project. SDDs are actively being used to annotate polymer nanocomposites and mechanical metamaterials in the collaborative MaterialsMine project, which includes researchers from RPI, Duke, CalTech, Northwestern, and the University of Vermont.

  • Rashid SM, McCusker JP, Pinheiro P, Bax MP, Santos HO, Stingone JA, Das AK, McGuinness DL. The semantic data dictionary–an approach for describing and annotating data. Data intelligence. 2020 Oct 1;2(4):443-86.
  • Rashid SM, Chastain K, Stingone JA, McGuinness DL, McCusker JP. The Semantic Data Dictionary Approach to Data Annotation & Integration. SemSci@ ISWC. 2017 Oct 20;2017.