Consensus on Best Practices
From Semantic Portal Wiki
These are the points transcribed off notes Andy was writing in the front of the room. There are better thought of as notes than all being points of consensus. An additional meeting is likely required to get to a point of consensus on these.
Thursday takeaways (1)
- We have the opportunity to do precise stacking of citations
Things to remember for later (2)
- We need to followup on "data as evidence" and associated specifications in other disciplines
What next ? (3)
- Figure out what we need to capture for attribution
- Attribution is there once a data citation with a DOI exists (PW)
- A data product is data with numbers. An information product combines and represents several data products. (PF)
- XRI stands for eXtensible resource ID, however it is not sufficient for our use here)
What are the attributes necessary to support attribution and provenance ? (4)
- Creator(s)
- Metadata creator
- Journal article (table) creator
- Data creator (eg. chief scientist)
- This is the person taking credit for the data, can be multi-valued
- Where to get the data via provenance
- Possible attribution element names
- dc.contributer - should be same for all figures/tables in the publication
- dc.creator
- wc.dataattribution (note that wc prefix is for woodshole core)
What other attributes do we need for "attribution" and reuse? (5)
- Explain steps in processing
- Important how we name things
- Tooltips help
- Idea: Could the methods section of a journal publication be assumed to be all that is required for instructions to re-use the referenced data products?
Attribution trail (6)
- If using all data from a dataset then the original data creator should be asked to be an author on the journal article
- If only part of the data is used then there should be a negotiation step with the data author
- "Rules of the road" can be made to greet someone accessing a repository so that they are encouraged to follow the above 2 policies
- Watermarks can be used for images
- Repository tools can be built with audit trail functionality built in
- Also possible to build tools that extract metadata from the original dataset rather than keeping metadata in a separate file
Where should we store the data and metadata associated with publications ? (7)
- One and same (together)
- Store data and metadata in the libarary?
- Need guidelines for repository characteristics (what is the required subset?)
- How do we and who certifies a data center as meeting necessary criteria?
- They are set by the ICSU
- ICSU is looking for others to sign up
- Suggestion was made that MBL-WHOI Library setup a prototype for Wiebe-usecase-like data and metadata (WHOAS?)
What is the role of the library concerning metadata coming in ? (8)
- Need to help support unintended use
- Trust factor - will the library do necessary QA/QC on the submitted metadata?
- yes, in phase 2 of a prototype
- need to define minimal/initial procedures for doing this QA/QC
- yes, in phase 2 of a prototype
- Suggestion was made that tis is the role of the library to accept and do minimal review (QA/QC) on submitted data/metadata and that the level of QA/QC will likely increase over time.
IDEA: A World Publication-Data Center proposal (9)
- A data center only for backbone data sitting behind any journal article
- A goal would be to minimize the work required by a scientist to submit data related to their journal article
- Would develop QA/QC procedures for metadata submitted, file content types, etc.
- Would propose a pilot focusing on journal articles submitted by Woods Hole Scientific Community science organizations
- We would try for accreditation for WDC
- Low-level peer review tools would be made available
- Specialists with expertise in data related to journal articles would be created
- Link-checking would be incorporated into site to validate OID, etc. references in the metadata
- Might fill current void created by policies of Science and Nature, for example

