Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous, Distributed Information Sources

Printer-friendly version


Development of high throughput data acquisition technologies, together with advances in computing, and communications have resulted in an explosive growth in the number, size, and diversity of potentially useful information sources. This has resulted in unprecedented opportunities in data-driven knowledge acquisition and decision making in a number of emerging increasingly data-rich application domains such as bioinformatics, environmental informatics, enterprise informatics, and social informatics (among others). However, the massive size, semantic heterogeneity, autonomy, and distributed nature of the data repositories present significant hurdles in acquiring useful knowledge from the available data. This paper introduces some of the algorithmic and statistical problems that arise in such a setting, describes algorithms for learning classifiers from distributed data that offer rigorous performance guarantees (relative to their centralized or batch counterparts). It also describes how this approach can be extended to work with autonomous, and hence, inevitably semantically heterogeneous data sources, by making explicit, the ontologies (attributes and relationships between attributes) associated with the data sources and reconciling the semantic differences among the data sources from a user's point of view. This allows user or context-dependent exploration of semantically heterogeneous data sources. The resulting algorithms have been implemented in INDUS - an open source software package for collaborative discovery from autonomous, semantically heterogeneous, distributed data sources.


DateCreated ByLink
August 20, 2014
Patrick WestDownload

Related Research Areas:

Future Web
Lead Professor: Jim Hendler
Description: Since its inception the World Wide Web has changed the ways people work, play, communicate, collaborate, and educate. There is, however, a growing realization among researchers across a number of disciplines that without new research aimed at understanding the current, evolving and potential Web, we may be missing or delaying opportunities for new and revolutionary capabilities. To model the Web, it is necessary to understand the architectural principles that have provided for its growth. Looking into the future, to be sure that it supports the basic social values of trustworthiness, personal control over information, and respect for social boundaries, a research agenda must be pursued that targets the Web and its use as a primary focus of attention. This research requires powerful scientific and mathematical techniques from many disciplines to explore the modeling of the Web from network- and information- centric views.
Concepts: Semantic Web
Ontology Engineering Environments
Lead Professor: Deborah L. McGuinness
Description: Ontology Engineering Environments
Concepts: Semantic Web
Semantic Foundations
Lead Professor: Deborah L. McGuinness
Description: Semantic Foundations
Concepts: Semantic Web