EAGER: Semantic Search

Printer-friendly version

Research Areas: Data Science, Web Science, Future Web
Principal Investigator: Jim Hendler
Co Investigator:
Concepts: Faceted Search, Big Data, Cognitive Computing, Semantic Web, Linked Data, Data Science, Web Science, Semantic Faceted Browse/Search
The goal of this project is to explore key algorithms, technologies and protocols that will lead to the next level of development of the original Semantic Web vision of the "Web of Data" - a Web in which the unstructured texts of the current Web are integrated in a seamless way with information currently locked in structured databases. While a huge amount of open data is being made available on the Web, especially in the "Open Government Data" arena, traditional computing techniques are inadequate for finding this data, for linking it to other data, and for reusing and repurposing the data resources. We will show that an innovative combination of Semantic Web technologies will provide the basis for a new approach to large-scale, on-line, data integration and use.

We will demonstrate our techniques by showing their efficacy on a combination of Open Government datasets being released around the world. There are already hundreds of thousands of these databases made available in machine-readable formats by countries, municipalities and cities, and the number is growing exponentially. This makes Open Government Data a large-scale testbed for Web-based data integration. We have collected the metadata for close to 400,000 datasets from more than 60 catalogs, from 20 countries, which are published in fourteen different languages. The project will show how the combination of linked-data representations, machine-readable metadata and Semantic Web ontologies will provide an ability to federate data across these catalogs, domains, and cultures. We will develop the foundational algorithms that make it possible for researchers to find, access, integrate and analyze ad hoc combinations of these many datasets integrated on the fly.

Thus, the outcome of this project will be to demonstrate techniques, and develop a proof-of-concept demonstration, showing that the integration of multiple data sources across the Web can be accomplished by the application of a combination of semantic information of different kinds. We will show that it is possible to build search and reuse tools that function across large distributed data collections, and we will explore the key research challenges in creating Web-scale linked-open-data repositories. The success of this project will demonstrate that by bridging the gap between structured and unstructured sources, we can develop techniques that will lead the way to a second generation of more powerful Semantic Web tools. The ultimate outcome of the work started in this project will thus be to allow scientists, engineers, and eventually end users to pursue their goals without needing the large proprietary data resources currently only available to a small set of researchers working in companies with access to "big data."