Language and Domain Independent Entity Linking with Quantified Collective Validation

Linking named mentions detected in a source document to an existing knowl- edge base provides disambiguated entity referents for the mentions. This allows better document analysis, knowledge ex- traction and knowledge base population. Most of the previous research extensively exploited the linguistic features of the source documents in a supervised or semi- supervised way. These systems there- fore cannot be easily applied to a new language or domain. In this paper, we present a novel unsupervised algorithm named Quantified Collective Validation that avoids excessive linguistic analysis on the source documents and fully lever- ages the knowledge base structure for the entity linking task. We show our ap- proach achieves state-of-the-art English entity linking performance and demon- strate successful deployment in a new lan- guage (Chinese) and two new domains (Biomedical and Earth Science).

View Publication