TWeD Talk: Experiences Curating Science Metadata and Recommendations for Publishing Metadata

When: March 5 2014
Where: Winslow 1140, RPI Campus, Troy, NY
TITLE: "Experiences Curating Science Metadata and Recommendations for Publishing Metadata"
LEADER: Jesse Weaver

Please join us Wed, 05 Mar for the return of Tetherless Ph.D. recipient Jesse Weaver as he leads us in a discussion of some of his current work at the Pacific Northwest National Labs on RDESC, "Experiences Curating Science Metadata and Recommendations for Publishing Metadata"

ABSTRACT: "Experiences Curating Science Metadata and Recommendations for Publishing Metadata"

At present, much science metadata is utterly inaccessible (i.e., not shared), digitally inaccessible (i.e., not on the Web), or machine-incomprehensible (i.e., text). Although standard vocabularies like GCMD keywords and CF standard names are a step in the right direction, much more is needed in order to bridge the semantic gap between the detail of science metadata and the generality of posed questions. As part of the RDESC project, we attempt to demonstrably bridge this gap for a specific atmospheric science use case by incrementally developing an OWL ontology to accommodate the precision of various metadata, and by curating the metadata into semantically rich, RDF triples. The ontology and RDF data model enable us to meaningfully related heterogeneous metadata of varying precision from different sources. In this talk, I will primarily discuss the metadata curation effort that has taken place to date in RDESC and make recommendations for how to improve on publishing science metadata.

RDESC is a DOE/ASCR-funded project in collaboration with RPI that aims to facilitate discovery of science resources at the scale of the scientific community. The project involves the curation of existing science metadata, the development of recommendations for publishing science metadata, and the development of a prototypical web interface for discovering resources described by the curated metadata.

BIOGRAPHY: Jesse Weaver is a Research Computer Scientist in the Data Intensive Science Computing (DISC) group at Pacific Northwest National Laboratory (PNNL) in Richland, WA. He is the PI of the Streaming Hypothesis Reasoning (Shyre) project and the Resource Discovery for Extreme Scale Collaboration (RDESC) project, the latter of which is a DOE/ASCR-funded project in collaboration with RPI. Jesse is also a key member of the Center for Adaptive Supercomputing Software (CASS) where he contributes to the development of a distributed graph database called SGEM.

Prior to joining PNNL in April 2013, Jesse was a doctoral student at RPI where he wrote his dissertation entitled "Toward Webscale, Rule-based Inference on the Semantic Web via Data Parallelism" for which he co-received the 2013 Karen and Lester Gerhardt Prize. Jesse was the Dr. Shirley Ann Jackson and Dr. Morris A. Washington Patroon Fellow, the first recipient of a Patroon Fellowship. While at RPI, he participated in the champion team of the 2009 Billion Triples Challenge, co-organized the Workshop on High Performance Computing for the Semantic Web (HPCSW) in both 2011 and 2012, and interned as a software engineer at Facebook for the summer of 2011.

Prior to his graduate studies at RPI, Jesse was a software engineer at Raytheon IIS where he contributed to code parallelization and R&D in knowledge management/discovery. He received his B.S. in Computer Engineering from the University of Arkansas in Fayetteville in 2006.

FOR MORE INFO: [1] TWC REDESC project page [2] PNNL CASS page

TWed Logistics (Spring 2014):
  • TWed schedule
  • 7p-8p, 1st floor Winslow (1140)
  • We try to alternate TWed Talks with TWed Hackspaces. The alternating pattern of TWed Talks and TWed Hackspaces may "off" due to leader availability and Institute scheduling.
  • Pizza or snacks will be provided for TWed Talks
  • Live video streams of TWed Talks will usually be available via ustream
  • An archive of past TWed Talks are also available on ustream. Direct links can be found in the schedule (below)
  • TWed Talks from previous terms are archived; topical archive coming soon!
About TWed:
  • "TWed" is the Tetherless World Educational Series * "TWed Talks" are informal overview talks and tutorials on topics of interest to the Tetherless World community. TWed gives members of the lab the chance to share tools and expertise. TWed talks are not lectures; they are expected to be highly interactive and fun. TWed leaders are encouraged to include live "hack" activities in their session plans.
  • "TWed Hackspaces" during the TWed time are informal group work sessions inspired by the Hackerspace movement. This is a time when TWC people will "be around" and you can rely on the "right" people being available to answer your questions and help with your hacks. This is also a great opportunity for project teams to "hack" together on problems, with the knowledge and resources of TWC surrounding you.