Jewett Meeting at MBL

From Semantic Portal Wiki

Jump to: navigation, search

Jewett Meeting at MBL (Workshop) [ Edit ]
description Data Provenance and Attribution for Published Datasets
location Jonsson Center, Woods Hole, MA
tag provenance; dataset; library


Contents

Login

location: http://tw.rpi.edu/portal/Jewett_Meeting_at_MBL

Shared wiki login account: Jewett (password: please contact baojie@cs.rpi.edu and dingl@cs.rpi.edu)

Create your own account:

For a brief wiki editing tutorial, see [1] (Youtube, 4 mintues)

Attendees

name email affiliation
Alice Orton aorton@usgs.gov usgs
Andy Maffei amaffei@whoi.edu whoi
Anthony Goddard agoddard@mbl.edu mbl
Arcot Rajasekar rajaseka@email.unc.edu unc
Art Gaylord agaylord@whoi.edu whoi
Arthur Newhall anewhall@whoi.edu whoi
Bob Groman rgroman@whoi.edu whoi
Cathy Norton cnorton@mbl.edu mbl
Cyndy Chandler cchandler@whoi.edu whoi
Deborah McGuinness dlm@cs.rpi.edu rpi
Diane Rielinger drielinger@mbl.edu mbl
Ed Urban ed.urban@scor-int.org scor
Gary Miller gmiller@usgs.gov usgs
Holly Miller hmiller@mbl.edu mbl
Jennifer Schopf jms@nsf.gov whoi/nsf
Kerstin Lehnert lehnert@ldeo.columbia.edu columbia
Li Ding dingl@cs.rpi.edu rpi
Lisa Raymond lraymond@whoi.edu whoi
Patrick West westp@rpi.edu rpi
Peter Fox pfox@cs.rpi.edu rpi
Peter Wiebe pwiebe@whoi.edu whoi
Ryan Schenk rschenk@mbl.edu mbl
Stephen Miller spmiller@ucsd.edu ucsd
Stephan Zednik zednis@rpi.edu rpi
Tom Moritz moritz@archive.org internet archive
Vicki Ferrini ferrini@ldeo.columbia.edu columbia

Agenda

Thursday, April 9

Campfire chat room: https://mblwhoilibrary.campfirenow.com/37e51

Raw Campfire transcripts: April 9th Chat April 10th Chat


2:00 pm Pre Conference : Praciticum Team meets and goes over their experience/Carriage House. (team members only)

3:30 pm Shuttle Service from Inn on the Square to Jonsson Center

3:45 pm Shuttle Service from Inn on the Square to Jonsson Center

Conference Starts

4:00 pm Coffee/Tea, Workshop begins: Carriage House

4:00-5:00 pm Keynote by Deborah McGuinness (RPI)

5:00-5:30 pm Challenges by Cyndy Chandler (WHOI)

5:30-6:00 pm Goals - discussion Andy Maffei (WHOI)/Cathy Norton (MBLWHOI Library)

  • focus is only on data behind a published journal article
  • examine attribution stream for this data, how is it cited?
  • examine where do you store the metadata about this data?
  • where do you store the data?
  • what metadata is required around the metadata?

6:00 pm Cocktails and Dinner-- Main House

Friday, April 10th

7:30 am Shuttle Service from Inn on the Square to Jonsson Center

7:45 am Shuttle Service from Inn on the Square to Jonsson Center

7:30-8:30 am Breakfast at Jonsson Center / Main House

Jonsson Center / Carriage House

8:30-9:00 am Data Library by Lisa Raymond (MBLWHOI Library)

9:00-9:30 am Persistent Archives: Long Term Sustainability of data based on policy and data virtualization by Arcot Rajasekar (UNC)

9:30-10:00 am NSF Office of CyberInfrastructure : What Are We Thinking About Data by Jennifer Schopf (NSF)

10:00-10:30 am Break

10:30-Noon Practicum - Use Cases

Noon Lunch - Jonsson Center/ Main House

1:00-1:30 pm Data Standards, Better Practices: US and others by Peter Fox (RPI)

1:30-3:00 pm - Use cases continued - followed by breakouts if necessary

3:00-3:30 pm Break

3:30-6:00 pm Consensus on Best Practices.... and work on white paper resulting from discussions.

6:00 pm CLAMBAKE at Jonsson Center / Main House

Meeting Notes and Slides

Data Library by Lisa Raymond (MBLWHOI Library)

Persistent Archives: Long Term Sustainability of data based on policy and data virtualization by Arcot Rajasekar (UNC)

Post-meeting Documents

Transcribed Easel Sheets from Best Practices discussion

Draft World Data Center Certification Criteria

Arthur Newhall's Observations

Use Cases

Use Cases

  • UseCase #1 for Group Discussion: A scientist wants to find all tables and figures in papers published the SW06 dataset that have have sound speed profiles in them.
  • UseCase #2 for Practicum Exercise: A scientist wants to publish the data associated with the article he is submitting on Acoustic Properties of Salpa thompson to a journal. What steps does he need to take and what information does he have to collect about this data in order do submit this information to the publisher.
NOTE: This is a real use case. We have an example of the steps Peter Wiebe took to do this and the products will be available for workshop participants - Peter Weibe's article: Acoustic properties of Salpa thompsoni that Neil Sarkar and Holly created Dublin Core metadata for, with separate metadata for the text and each figure and table.

template

Template for data review

Generic Data Pipeline

An example of a general data pipeline

data

A link to backbone data for Table 2

A link to backbone data for Table 2

A link to backbone data for Table 6

A link to backbone data for Figure 3

A link to backbone data for Figure 7


summary

Category Description Download
File:CTD085.txt Category:DataFile
Category:Thing
Table 2 backbone data for CTD cast 85 (one of four)used to compute the mean and max/min water properties where salps were collected for experimental work as presented in Table 2.
File:CTD087.txt Category:DataFile
Category:Thing
Table 2 backbone data for CTD cast 87 (one of four)used to compute the mean and max/min water properties where salps were collected for experimental work as presented in Table 2.
DSC02225.JPG Category:ImageFile
Category:Thing
Figure 3. Backbone data consist of a series of Jpeg images that were used in the analysis.
JMBL20090410 Example Figure 3 Category:Dataset
Category:Thing
the dataset represented by Figure 3
JMBL20090410 Example Step 21 Category:Step backbone data used as input, to compute the mean and max/min water properties where salps were collected for experimental work as presented in Table 2 which is the output
JMBL20090410 Example Step 31 Category:Step part of of Figure 3
JMBL20090410 Example Table 2 Category:Dataset
Category:Thing
the dataset represented by table 2
File:Salp200-1 selection inner part 2.xls Category:DataFile
Category:Thing
Figure 7: TS-distributions for targets ascribed to salps. Plot for 200 kHz from data in this file.
File:Salp38 1-selection inner part 2.xls Category:DataFile
Category:Thing
Table 6: Summary statistics of mean TS, confidence interval for the mean, 25 and 75% quartiles and Q75 – Q25 as a measure of spread derived for 38 kHz upper data portion from this file.


Supplementary Use Cases

UseCase A

A paper is to be published in DSR II and the author needs to know how to reference the data that are available online. As the data manager, I need to know whether I need to do anything differently in how the source data are documented and served (additional metadata?, persistent identifiers?).

The paper (published 2008 in DSR II): Qian P. Li, Dennis A. Hansell, Nutrient distributions in baroclinic eddies of the oligotrophic North Atlantic and inferred impacts on biology, Deep Sea Research Part II: Topical Studies in Oceanography, Volume 55, Issues 10-13, Mesoscale Physical-Biological-Biogeochemical Linkages in the Open Ocean: Results from the E-FLUX and EDDIES Programs, May-June 2008, Pages 1291-1299, ISSN 0967-0645 DOI: http://dx.doi.org 10.1016/j.dsr2.2008.01.009 URL: http://www.sciencedirect.com/science/article/B6VGC-4SFR7MF-5/2/b08137059737fef3a654b2fd7897d4fb

that references data that are available online from BCO-DMO: http://osprey.bco-dmo.org/project.cfm?flag=viewd&id=13

the likely source data objects for the paper are listed below: http://ocb.whoi.edu/jg/serv/OCB/EDDIES/INVENTORY.html1

Measurement PI_name Data object URL OC404-1 bottle (merged) OCB_DMO http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC404_S1/bottle_OC404_S1.html0 bottle oxygen Bates http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC404_S1/oxygen.html1 nM NO3/PO4 Hansell http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC404_S1/nuts_low.html0 DOC; DON; DOP Hansell http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC404_S1/organic_matter.html0 del15N (PON) Hansell http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC404_S1/del15N-PON.html0

WB0409 Niskin bottle samples Bates http://ocb.whoi.edu/jg/serv/OCB/EDDIES/WB0409/bottle.html0 bottle oxygen Bates http://ocb.whoi.edu/jg/serv/OCB/EDDIES/WB0409/bottle.html0 DOC; DON; DOP Hansell http://ocb.whoi.edu/jg/serv/OCB/EDDIES/WB0409/organic_matter.html0 del15N (PON) Hansell data not contributed

OC404-4 bottle file (base) McGillicuddy http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC404_S2/bottle.html0 bottle oxygen Bates http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC404_S2/oxygen.html1 nM NO3/PO4 Hansell http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC404_S2/nuts_low.html0 DOC; DON; DOP Hansell data not contributed del15N (PON) Hansell http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC404_S2/del15N-PON.html0

WB0413 Niskin bottle samples Bates http://ocb.whoi.edu/jg/serv/OCB/EDDIES/WB0413/bottle.html0 bottle oxygen Bates http://ocb.whoi.edu/jg/serv/OCB/EDDIES/WB0413/bottle.html0 DOC; DON; DOP Hansell data not contributed del15N (PON) Hansell data not contributed

OC415-1 bottle file (base) McGillicuddy http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC415_S1/bottle.html1 nM NO3/PO4 Hansell http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC415_S1/nanoNutrients.html0 DOC; DON; DOP Hansell data not contributed del15N (PON) Hansell data not contributed

WB0506 Niskin bottle samples Bates http://ocb.whoi.edu/jg/serv/OCB/EDDIES/WB0506/bottle.html0 bottle oxygen Bates http://ocb.whoi.edu/jg/serv/OCB/EDDIES/WB0506/bottle.html0 DOC; DON; DOP Hansell data not contributed del15N (PON) Hansell data not contributed

OC415-2 bottle file (base) Ledwell http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC415_T1/bottle.html1

OC415-3 bottle file (base) McGillicuddy http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC415_S2/bottle.html1 nM NO3/PO4 Hansell http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC415_S2/nanoNutrients.html0 DOC; DON; DOP Hansell data not contributed del15N (PON) Hansell data not contributed

WB0508 Niskin bottle samples Bates http://ocb.whoi.edu/jg/serv/OCB/EDDIES/WB0508/bottle.html0 DOC; DON; DOP Hansell data not contributed del15N (PON) Hansell data not contributed

OC415-4 bottle file (base) Ledwell http://ocb.whoi.edu/jg/serv/OCB/EDDIES/OC415_T2/bottle.html1

UseCase B

A scientist has found a sound profile represented as a graph in a paper that he feels justifies a hypothesis he has put forward. He wants to get access to the original sensor data related to that sound profile. How does he do this?


UseCase C

A scientist has 10,000 images on slides sitting on his shelf that represents 10 years of work that he wants to digitize. How to get the metadata for data collected in the past before best practices for metadata was considered. Is it even worth the effort?


UseCase D

A scientist has written a paper with data that s/he would like to publish but access to the data is restricted or the use of the data is restricted, for some period of time. The publisher has requested that all data represented as figures or tables in this journal be "properly cited" with repository access.


UseCase E

A scientist wants to find all the data associated with a specific harmful algal bloom. He is interested both in orginal data and derived data that has been published in articles and deposited. He wants to be able to determine who collected the original data, who analyzed and processed the data. He will then publish a review article that will contain a synthesis of this information. How will he find everything (what metadata, connections, organization will be needed in an 'ideal world')? How will he know who should receive attribution? How will he publish and maintain attribution on his data synthesis product once he publishes it?

Suggested Preparation Materials for the Meeting

  • National Science and Technology Council Releases Strategy for Digital Scientific Data. A view down the middle of a boron nitride nanotube.

The National Science and Technology Council (NSTC) released a report describing a strategy to promote preservation and access to digital scientific data. The report, Harnessing the Power of Digital Data for Science and Society, was produced by the NSTC's Committee on Science under the auspices of the Office of Science and Technology Policy (OSTP) in the Executive Office of the President.

  • Survey of data provenance techniques. Technical Report IUB-CS-TR618

http://www.cs.usask.ca/faculty/sal426/Provenance/docs/Literature%20Review/TR618.pdf

  • ICSU Ad Hoc Strategic Committee on Information and Data

http://www.icsu.org/Gestion/img/ICSU_DOC_DOWNLOAD/2123_DD_FILE_SCID_Report.pdf

  • Sudha Ram, Jun Liu. Understanding the Semantics of Data Provenance to Support Active Conceptual Modeling

http://en.scientificcommons.org/41046974

  • Fox, McGuinness, Pinheiro da Silva. Knowledge Provenance in Virtual Observatories: Applications to Image Data Pipelines, 2008.

http://data.semanticweb.org/conference/iswc/2008/paper/poster_demo/70/html

  • Pinheiro da Silva, McGuinness, McCool. Knowledge Provenance Infrastructure.

http://en.scientificcommons.org/685801

  • Clifford Lynch. The Shape of the Scientific Article in the Developing Cyberinfrastructure,” CTWatch Quarterly (August 2007)

http://www.ctwatch.org/quarterly/articles/2007/08/the-shape-of-the-scientific-article-in-the-developing-cyberinfrastructure/

  • Skills, Role & Career Structure of Data Scientists & Curators: Assessment of Current Practices & Future Needs. JISC Report 2008.

http://www.jisc.ac.uk/publications/publications/dataskillscareersfinalreport.aspx

  • Baker, Barton, Peterson, Fox. Informatics and the 2007-2008 Electronic Geophysical Year. EOS, Transactions, American Geophysical Union 89(48) 2008.

http://www.agu.org/pubs/crossref/2008/2008EO480001.shtml (subscription)

  • Gomes, Graybeal and O'Reilly. Data Management Issues in Operational Ocean Observatories.

Sea Technology 48(5) p.17-20, 2007 http://www.highbeam.com/doc/1P3-1284688471.html (subscription)

  • Altman and King. A Proposed Standard for the Scholarly Citation of Quantitative Data

http://gking.harvard.edu/files/cite.pdf

  • Trustworthy Repositories Audit & Certification: Criteria and Checklist

http://www.crl.edu/PDF/trac.pdf

  • SCOR/IODE Workshop on Data Publishing, Oostende, Belgium, 17-19 June 2008. UNESCO, 2008. IOC Workshop Report No. 207.

http://www.iode.org/index.php?option=com_oe&task=viewDocumentRecord&docID=2457

  • Standards for DATA

A Proposed Standard for the Scholarly Citation of Quantitative Data http://gking.harvard.edu/files/cite.pdf

  • ISO 8000 under development

http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=50801

ISO 8000 - A Standard for Data Quality by Grantner, Emily Solving Data Quality Problems Using Data Standards by de Jager, Salomon

  • ISO 19115

http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=26020

  • ISO 19115:2003 defines the conceptual model required for describing geographic information and services. It provides information about the identification, the extent, the quality, the spatial and temporal schema, spatial reference, and distribution of digital geographic data
  • 11179

http://metadata-standards.org/11179/

This standard addresses the semantics and representation of data, and the registration of descriptions of that data. The standard has strong international backing and is freely available.

SO/IEC 11179 specifies the kind and quality of metadata necessary to describe data, and it specifies the management and administration of that metadata in a metadata registry (MDR). It applies to the formulation of data representations, concepts, meanings, and relationships between them to be shared among people and machines, independent of the organization that produces the data. It does not apply to the physical representation of data as bits and bytes at the machine level.

In ISO/IEC 11179, metadata refers to descriptions of data. ISO/IEC 11179 does not contain a general treatment of metadata. ISO/IEC 11179-1:2004 provides the means for understanding and associating the individual parts of ISO/IEC 11179 and is the foundation for a conceptual understanding of metadata and metadata registries.

Facts about Jewett Meeting at MBLRDF feed
DescriptionData Provenance and Attribution for Published Datasets
End date10 April 2009  +
LocationJonsson Center, Woods Hole, MA  +
NameJewett Meeting at MBL  +
SponsorMBLWHOI Library  +, and Jewett Foundation  +
Start date9 April 2009  +
TagProvenance  +, Dataset  +, and Library  +
Personal tools
Semantic Web Community
Tetherless World constellation
maintenance