Integrating Semantics and Numerics: Case Study on Enhancing Genomic and Disease Data Using Linked Data Technologies

Bioinformatics was an early adopter of semantic technologies and provided many ontologies and datasets in semantic formats to aid integration of biological and biochemical data. There is a lack of tooling support however to help identify and link experimental bioinformatics data and analyses with relevant semantic knowledge. We present a novel approach that combines statistical analyses with semantic data integration to identify support in the literature for findings and highlight where findings expand or contradict knowledge in biomedical databases. We integrate genetic, proteomic, disease, and drug data from numerous sources, including Bio2RDF, Uniprot, Ensembl, and String-DB, along with gene expression data from the Neural Stem Cell Institute and Rensselaer Polytechnic Institute's "Repurposing Drugs with Semantics" project. We will present our integrative system architecture, demonstrate semantically enriched examples from analysis on real datasets showing how semantic representations aid in analysis and interpretation, and discuss architectural lessons learned at scale.

View Publication

Associated Projects