Improving Ontology Service-Driven Entity Disambiguation

Printer-friendly version

Abstract:

One of the long-standing challenges in natural language processing is uniquely identifying entities in text, which when performed accurately and with formal ontologies, supports efforts such as semantic search and question-answering. With the recent proliferation of comprehensive, formalized sources of knowledge (e.g., DBpedia, Freebase, OBO Foundry ontologies) and advancements in supportive Semantic Web technologies and services, leveraging such resources to address the entity disambiguation problem in the industry setting as “off the shelf” within natural language processing pipelines becomes a more viable proposition. In this paper, we evaluate this viability by building and evaluating an entity disambiguation pipeline founded on publicly available ontology services, namely those provided by the NCBO BioPortal. We chose BioPortal due to its current use as an ontology repository and provider of ontological services for the biomedical informatics community. To consider its usage outside the biomedical domain, and given our immediate project goal for facilitating semantic search over Earth science datasets for the DataONE project, we focus on the disambiguation of geographic entities. For this work, we leverage NCBO’s Term service in conjunction with NCBO’s entity disambiguation service, the Annotator, to demonstrate an enhancement of the Annotator service, through application of a vector space model representation of ontological entities and relationships to drive scoring improvements. This work ultimately provides a methodology and pipeline for improving publicly available ontology service-based entity disambiguation, demonstrated through an enhanced version of the NCBO Annotator service for geographic named entity disambiguation.

History

DateCreated ByLink
September 26, 2014
22:15:26
Patrick WestDownload
August 29, 2014
18:46:31
Patrick WestDownload