Semantically-targeted analytics for reproducible scientific discovery

We develop a semantics-driven, automated approach for dynamically performing rigorous scientific studies. This framework may be applied to a wide variety of data and study types; here, we demonstrate its suitability for conducting retrospective cohort studies using publicly available population health data. The goal is to identify risk factors that, for some automatically-discovered subpopulation, have significant associations with some health condition. Our semantically-targeted analytics (STA) approach addresses the end-to-end data science workflow, ranging from intelligent data selection to dissemination of derived data and results in a rigorous, reproducible way. STA drives an automated architecture allowing analysts to rapidly and dynamically conduct studies for different health outcomes, risk factors, cohorts, and analysis methods; it also lets the full analysis pipeline be modularly specified in a reusable domain-specific way. The framework developed here maybe readily extended to other learning tasks and datasets in the future.

View Publication

Associated Projects

The Center for Health Empowerment by Analytics, Learning, and Semantics (HEALS) is a five-year collaboration between Rensselaer and IBM aimed at researching how the application of advanced cognitive computing capabilities can help people to understand and improve their own health conditions.