Provenance Meeting 100901
From Tetherless World Wiki
Contents |
Provenance Agenda
- DimDim meeting room
- Meeting Etherpad
- Introductions of new people
- Provenance overview
- New SPCDIS work- Stephan and James
- CSIRO update
- MDSA update
- DQSS update
- Aerostat - update
Attendance
- Patrick (call in)
- Stephan (call in)
- Peter
- Mandeep Singh
- Arun (graduate student taking classes with Peter Fox)
- Jim McCusker
- Joanne Luciano
- Deborah McGuinness
Past Action Items
- James to post current slides (pdf), or link to (UNKNOWN)
New Action Items
- SZ: Add new use cases for SPCDIS to the Use Case page on the wiki
- SZ: Also make sure the use cases on that page still make sense, need to be removed
- SZ & PW: new vsto sparql use cases (or use cases that otherwise require a persistent triple-store)
- SZ & DLM: observer log comment hierarchy discussion with deborah
- SZ: organize new observer comment query use cases and requirements for a student activity
- SZ: put all my new observer comment pml on escience
- James Michaelis: to post current slides (pdf), or link to
- SZ, DLM, James: Review comment hierarchy
- Patrick needs to push new version of VSTO to incorporate Cynthia's changes
- Patrick and Cynthia will work to open up the port on tw1 for at least access within VPN
Meeting Notes
Fun with technology
- MediaWiki for meeting information
- Dimdim for telecon
- titanpad
Introductions
- Mandeep - Masters Student
- Arun - Working with Professor Peter
- Joanne - Research Assoc Professor - http://tw.rpi.edu/wiki/Joanne_S._Luciano
- Deborah McGuinness
- Peter Fox - Lead on 4 science provenance projects (on wiki page)
- Jim McCusker - 2nd year, bioinformatics doing cancer research, national cancer institute work, trying to get a consistent representation of provenance, experimantal artifacts, history of data
- Cynthia Chang - Research Staff, Inference Web, PML, Provenance
This is our bi-weekly meeting going over science provenance activity
Transparency and Trust
- Also fitness of use
- expose assumptions, caveats, etc. in processing
Fragmentation, Disconnection, Encapsulation bad for transparency
information/data models describe a focused domain
- used in isolation the models expresivity is limited
- information models often have some overlap
- construct an integrated information model that utilizes multiple specialized domain models
- integrated information model more expressive then the sum of its parts
Spectrum of a provenance ecosystem
- explanation
- justification
- verifiability
- proof
- trust
All can be considered part of the all encompassing notion of 'Transparency'
SPCDIS Update
- Stephan and James met with MLSO scientists
- James went over work on role based presentation of information
- visualization is based on experties of the user, how much they want to see
- Stephan has scripts that generate PML from 5 years of observer logs
- Two different justifications for each
- this information was taken from the observer log
- the user made this comment at a certain time
- mockup at http://tw.rpi.edu/portal/SPCDIS_Workgroup_-_Visualization_and_User_Interaction on the bottom
- Stephan has comments encoded in PML.
- Looking at putting (the PMLJ) in a triple store
- A few new use cases. find all comments that mention a particular text string -
- give me all the pics comments that mention a particular text string
- todo - deborah might review the comment hierarchy with stephan / james
- Two use cases
- - "Give me all the MkIV problem comments in 2001 which mention 'tophat'"
- - "Give me all the PICS comments in 2001 that mention 'occulter'"
- Couple more use cases to think about:
- --when did the occulter change? (stephan calls this a "base" use case
- -- what occulter was used during this image's observation? (when was the last occulter change event? what was the occulter changed to?)
- nice example in the comment log that show a correction comment (that is disconnected - example - "The two reports above shold be north WEST not north east as reported
CSIRO
- W3C Sensor ontology (http://www.w3.org/2005/Incubator/ssn/)
- Developing a water sensor network in Tasmania
- 6-7 people
- 1 year intensive project
- Patrick and Stephan to go there for 2 weeks sometime in October
- Currently missing the provenance information
- They have WaterML and SensorML
New work with MDSA
- Introducing more science terminology
- New types of information that can be exposed to the user
- More information that can be expressed in the information model
DQSS
- Went through a release of the data quality screening mechanism
- Last couple weeks we've been trying out the web interface
- Users want data screened to a particular level
- Wanting to bring in more domain knowledge in the provenance information. Group information based on certain concepts or parameters in the science domain
- Looking in to using SWEET
- http://mirador-ts1.gsfc.nasa.gov/
- Only a couple dataproducts utilize the semantic information - AIRX2RET, AIRH2RET and AIRS2RET, and only for Version 5 of those products (not version 3).
- Will be expanding to include more data products
- Not yet using PML
- Chance to generate a ton of provenance information
Triple Stores
- Provenance
- PMLp triple stores upgraded to new version of ???, loaded in, and can do a search just fine
- Looking to get PMLp that Stephan is generating to load into triple store
- Modifying the PML API to work against the triple stores
- Yes ... eventually
- And will be able to
- Might want to get a student to work on that, to put all of the PML into the triple store
- Alegrograph has very poor documentation
- Two students coming back from Alegrograph, Shangguan and Josh
- Action Item: SZ - generate background information on observer log, observer log RDF/PML, and SPCDIS use cases involving queries on observer comments.
- eScience
- Putting VSTO knowledge base into triple store as well and testing
- Action Item: SZ - generate research VSTO use cases
AeroStat
- Sharing mindmap from Greg L. of GSFC
- Categorize the infomration and match it up with elements in the ontology
- Some might be internal provenance, other information external provenance
AGU
- go ahead and work on abstract presented at IPAW
- James can do both abstracts for both his ideas, submit both, we'll find another first author
- Abstract information should go to http://tw.rpi.edu/portal/AGUFall10_Abstracts
