IN22A-05Semantically aided interpretation and querying of Jefferson Project data using the SemantEco framework

Printer-friendly version

Concepts:Data Science


We will describe the benefits we realized using semantic technologies to address the often challenging and resource intensive task of ontology alignment in service of data integration. Ontology alignment became relatively simple as we reused our existing semantic data integration framework, SemantEco. We work in the context of the Jefferson Project (JP), an effort to monitor and predict the health of Lake George in NY by deploying a large-scale sensor network in the lake, and analyzing the high-resolution sensor data. SemantEco is an open-source framework for building semantically-aware applications to assist users, particularly non-experts, in exploration and interpretation of integrated scientific data. SemantEco applications are composed of a set of modules that incorporate new datasets, extend the semantic capabilities of the system to integrate and reason about data, and provide facets for extending or controlling semantic queries. Whereas earlier SemantEco work focused on integration of water, air, and species data from government sources, we focus on redeploying it to provide a provenance-aware, semantic query and interpretation interface for JP’s sensor data. By employing a minor alignment between SemantEco's ontology and the Human-Aware Sensor Network Ontology used to model the JP’s sensor deployments, we were able to bring SemantEco's capabilities to bear on the JP sensor data and metadata. This alignment enabled SemantEco to perform the following tasks: (1) select JP datasets related to water quality; (2) understand how the JP's notion of water quality relates to water quality concepts in previous work; and (3) reuse existing SemantEco interactive data facets, e.g. maps and time series visualizations, and modules, e.g. the regulation module that interprets water quality data through the lens of various federal and state regulations. Semantic technologies, both as the engine driving SemantEco and the means of modeling the JP data, enabled us to rapidly align the two ontologies without needing the projects to change models and allowed us to adopt the existing software development effort invested in SemantEco as a portal for exploring Lake George's water quality data. We plan to extend the registration of modules and facets to handle climate data, hydrology data, and food web data.

Related Projects:

Jefferson Project at Lake George Project LogoE-Science Jefferson Project on Lake George (Jefferson Project)
Principal Investigator: Deborah L. McGuinness
Co Investigator: Paulo Pinheiro
Description: The Jefferson Project at Lake George is building one of the world’s most sophisticated environmental monitoring and prediction systems, which will provide scientists and the community with a real-time picture of the health of the lake. Launched in June 2013, the project aims to understand and manage multiple complex factors—including road salt incursion, storm water runoff, and invasive species—all threatening one of the world’s most pristine natural ecosystems and an economic cornerstone of the New York tourism industry. The project is a three-year, multimillion-dollar collaboration between Rensselaer Polytechnic Institute, IBM, and The FUND for Lake George. The collaboration partners expect that the world-class scientific and technology facility at the Rensselaer Darrin Fresh Water Institute at Lake George will create a new model for predictive preservation and remediation of critical natural systems in Lake George, in New York, and ultimately around the world.

Related Research Areas:

Data Science
Lead Professor: Peter Fox
Description: Science has fully entered a new mode of operation. Data science is advancing inductive conduct of science driven by the greater volumes, complexity and heterogeneity of data being made available over the Internet. Data science combines of aspects of data management, library science, computer science, and physical science using supporting cyberinfrastructure and information technology. As such it is changing the way all of these disciplines do both their individual and collaborative work.

Data science is helping scienists face new global problems of a magnitude, complexity and interdisciplinary nature whose progress is presently limited by lack of available tools and a fully trained and agile workforce.

At present, there is a lack formal training in the key cognitive and skill areas that would enable graduates to become key participants in escience collaborations. The need is to teach key methodologies in application areas based on real research experience and build a skill-set.

At the heart of this new way of doing science, especially experimental and observational science but also increasingly computational science, is the generation of data.