The Tetherless World Constellation is proud to announce the successful completion of Jin Zheng's Thesis Defense.
Semantic Similarity Computation on the Web of Data
ADVISOR: Peter Fox
Over the last few decades, many efforts have been devoted to researching and developing effective semantic similarity computation algorithms for different scenarios, such as similarity between free texts, and similarity between objects. As the result of these efforts, there are many semantic similarity computation algorithms that utilize different information sources, for example, information content based algorithms like the vector space model; ontology based edge counting methods, like semantic similarity methods in WordNet; structure or feature based methods, like Tverskys model.
However, none of the existing algorithms are aimed to solve similarity computation problem for the entities on the Web of Data. Applying existing similarity computation algorithms for texts or words directly on entities on the Web of Data (WoD) would compute an inaccurate similarity score. The reason that these similarity computation algorithms cannot compute the score accurately for entities on WoD is that they are purely based on text analysis and did not utilize the rich semantic relations and semantic descriptions of the entities during similarity computation. Semantic similarity computation problem on entities of WoD is important, because there are many applications are relying on similarity computation, such as entity matching, entity annotation, or entity ranking.
The primary goal of this study is to investigate how to compute semantic similarity score among entities on the Web of Data. We design 1) a novel semantic similarity computation model to compute similarity among the entities on the Web of Data and other structured or unstructured data entities. The new similarity computation model leverages the theory of information entropy to determine the amount of meaningful information presented in the entity description nd thereafter compute the amount of meaningful information shared by the entities. The model uses machine learning approaches to learn and assign appropriate weights to shared or unique information of the entities in order to highlight important and meaningful information. The model also tackles scalability issue of the similarity computation which is a major challenge given the amount of entities on the Web of Data. To prove the effectiveness of proposed semantic similarity computation model, we 2) apply the model to develop systems to solve entity matching problem, and entity annotation problem on the Web of Data. We show that using our model, we can improve the current state of the art when solving these problems.