When: February 1 2017
Where: Winslow Building Room 1140, RPI Campus, Troy, NY, USA
TWed Talk: Weds, 01 Feb (6p Winslow 1140)
TITLE: "Semantic Markdown: Embedding Workflow Semantics via R Markdown"
LEADER: John Erickson
KEYWORDS: Semantic Workflows, Reproducibility, Data Analytics

Please join us this Weds (6p, Winslow 1140) as I discuss recent thoughts on using esp. R Markdown to extend the RStudio environment to enabling data analysts to directly generate and publish RDF that richly describes the semantics of their scripts. This is a possible next step towards best practices for "in situ" embedding of appropriate concepts and vocabulary from established ontologies (including ProvONE and domain ontologies) into practical workflows.

DESCRIPTION: I'll discuss new work that aims to explore extending markdown syntax (esp. R Markdown) in concert with 'knitr' to directly produce workflow markup, in a human-compatible way. One example of an outcome: An RStudio user can "knit" a markdown rendition that, instead of generating (e.g) PDF or HTML, an extension will generate RDF (TTL or JSON-LD) or HTML+RDFa. By "human readable," we mean markdown best practices will be developed that are reasonable for a data analyst to use; methods (possibly based on templates) must be developed that do not require the user to "know" RDF. Today we can create cumbersome R Markdown (Rmd) files that produce HTML+RDFa outputs with correct embedded workflow semantics, but the user must be an HTML and RDFa hacker to understand the code. Workflow reproducibility requires tools that data analysts will actually use.

This work will be an advancement of the semantic workflow work inspired by YesWorkflow, and leverages an approach using standard practices for R extensions, markdown and publication, creating a direct path for data analysts to get their workflows represented in knowledge graphs. This approach broadens the potential user base by helping to ensure their workflows and results are easier to discover, conceptually easier to understand, and therefore increasing the likelihood they will be cited, reused and reproduced.

BIO: John S. Erickson, Ph.D. has spent over two decades studying the unique social, legal, and technical problems that arise when managing and disseminating information in the digital environment. Currently Director of Research Operations for the Rensselaer Institute for Data Exploration and Application (The Rensselaer IDEA) and Deputy Director of the Web Science Research Center of the Tetherless World Constellation (TWC) at Rensselaer Polytechnic Institute (RPI), John coordinates, contributes, and teaches.

