Srihari Sridharan
Knowledge graph embedding (KGE) methods such as TransE perform link pre-diction by learning vector representations of entities and relations from triples, but they typically ignore ontology/schema constraints (e.g., relation domains and ranges). This raises a practical question: if we provide ontology information in addition to a standard KGE pipeline, do we obtain better link prediction accuracy and/or more semantically valid predictions?
We study two simple, reproducible strategies for injecting ontology information into a “pure” embedding workflow: (1) inference-time ontology filtering that masks domain/range-invalid candidate entities during ranking; and (2) JOIE- style joint training that learns from instance triples and ontological concepts jointly. Experiments on three semantically enriched link prediction benchmarks (DB100k+, YAGO3-10+, NELL-995+) show that a large fraction of top-ranked predictions from vanilla TransE violate ontology constraints, motivating an explicit validity metric (Invalid@K). Ontology filtering improves filtered MRR by ≈ 1.10× on DB100k+, 1.42× on YAGO3-10+, and 2.26× on NELL-995+. On DB100k+, JOIE-style joint training further improves filtered MRR and Hits@10 over TransE+ONTOFILTER. We analyze when ontology helps, when it can hurt (ontology incompleteness), and why dataset-specific schema coverage matters.
Links:
- Final Paper: https://drive.google.com/file/d/1-NC9TOWe46m9mxeWesfKq0InALRtS4r9/
- Final presentation (slides):
- Final presentation (video): https://youtu.be/8xPwTO3kPTE
- github repository: https://github.com/sridhs3/hybrid_ai_final