Language and Domain Independent Entity Linking with Quantified Collective Validation

Printer-friendly version



Citation: Wang, H., Zheng, J., Ma, X., Fox, P., and Ji, H. 2015. Language and Domain Independent Entity Linking with Quantified Collective Validation. In Proceedings of Conference on Empirical Methods in Natural Language Processing 2015 (September 17-21 2015, Lisbon, Portugal). [Download]

Presented at the Conference on Empirical Methods in Natural Language Processing 2015

Authors: Han Wang, Jin Zheng, Xiaogang Ma, Peter Fox, & Heng Ji

Abstract: Linking named mentions detected in a source document to an existing knowledge base provides disambiguated entity referents for the mentions. This allows better document analysis, knowledge extraction and knowledge base population. Most of the previous research extensively exploited the linguistic features of the source documents in a supervised or semi-supervised way. These systems therefore cannot be easily applied to a new language or domain. In this paper, we present a novel unsupervised algorithm named Quantified Collective Validation that avoids excessive linguistic analysis on the source documents and fully leverages the knowledge base structure for the entity linking task. We show our approach achieves state-of-the-art English entity linking performance and demonstrate successful deployment in a new language (Chinese) and two new domains (Biomedical and Earth Science). All the experiment datasets and system demonstration are available at http://tw.rpi.edu/web/doc/hanwang_emnlp_2015 for research purpose.

Concepts: Natural Language Processing and Artificial Intelligence

Datasets:

Demonstrations:

Poster: EMNLP 2015 Poster [Download]