Data Analytics as Data: A Semantic Workflow Approach

Printer-friendly version

Abstract:

By treating the end-to-end data science workflow as data itself and through the conceptual modeling of the goals and functional intent of the data analyst, the entire process of data analytics becomes open and accessible to the powerful tools of artificial intelligence, machine learning, statistics, and data mining. We examine the fundamental questions and capabilities that must be addressed to realize capturing and reasoning over workflows as well as interpreting and contextualizing their results. Our approach focuses on capturing key components of complete workflow processes, making explicit the “deep” semantics of the workflow plan; the analysis performed; the structure and sub-components of the workflow; and intermediate and final data products. Our goal is to practically provide sufficient detail to facilitate workflow and work product integration, interpretation, reuse, reproducibility, recommendation, and search. The structure for this workflow-as-data view is formalized by an extensible, reusable ontology that we are creating that applies to all aspects of the workflow representation and reasoning process. We report on our exploration and reuse of existing methods, tools and ontologies as well as our semantic analytics contributions to real world projects addressing children’s health challenges.

History

DateCreated ByLink
January 10, 2017
05:45:40
John S. EricksonDownload

Related Projects:

SemNExT LogoSemantic Numeric Exploration Technology (SemNExT)
Principal Investigator: Kristin Bennett and Deborah L. McGuinness
Description: SemNExT combines numeric analysis of data with semantic understanding and explanation technologies to provide a holistic means of exploring robust datasets.
TW LogoSemantic Workflow and Management of Provenance (SWaMP)
Description: A joint effort between the Tetherless World Constellation at Rensselaer Polytechnic Institute and the The Commonwealth Scientific and Industrial Research Organisation (CSIRO).

Related Research Areas:

Health Informatics
Lead Professor: Deborah L. McGuinness
Description:

Health informatics is "the interdisciplinary study of the design, development, adoption and application of IT-based innovations in healthcare services delivery, management and planning." Procter, R. Dr. (Editor, Health Informatics Journal, Edinburgh, United Kingdom). (From the U.S. National Library of Medicine)


Concepts: None.
Web Science
Lead Professor: Jim Hendler, Deborah L. McGuinness
Description: Web Science is the study of the World Wide Web and its impact on both society and technology, positioning the Web as an object of scientific study unto itself. Web Science recognizes the Web as a transformational, disruptive technology; its practitioners study the Web, its components, facets and characteristics. Ultimately, Web Science is about understanding the Web and anticipating how it might evolve in the future.
Concepts: Semantic Web