Doug Fils visit from International Ocean Drilling Program 2011 May 25

Printer-friendly version

May 25, 2011 - Winslow (Doug, Deborah, Xian, Li, Tim, Dominic, Peter John (on skype))
Doug Fils (Ocean Leadership) IODP - International Ocean Drilling Program - esp. US Implementing Organizations (IOs)
Two drilling vessels.
drill cores in ocean.
data on core -> owned by A&M
data on bore hole -> owned by observatory
Janus a database
LYMS a database for new environment
Launch data is log files.
Data is depth calibrated (use a dozen depths) 1 or 2 common depths.
1 year moretorim.
after 1 year, they aren't getting data out to broader community.
discovery - stuck behind web interfaces.
every cruise can begin to define own terms.
top and bottom age ranges (or rocks?)
different teams go out wanting to record using their own terms.
only if you were on that cruise can you figure out the terms.
"Show me all terms that span Jurassic"
age models for each hole
Program has legs
"Leg" is an expedition that a Cruise goes on
Sites on that Leg to drill
at each site, can do multiple holes (string might break)
Cores recovered from site
Sections of a Core are 1.5 meters long
in section, samples off of that Section.
Ages at depths.
different depths at different ages as you go across floor.
want to improve discoverability and usefulness of our datasets.
have tried: expose REST services to feed parameters- now just making it easier to feed unknown parameters in.
Doug liked linked data for federation solutions.
The move is from SOA to ROA (Resource Oriented Arch)
Realized that they could start making derivative data based on their original data.
searching by depth isn't useful.
searching by AGE is important for the scientific.
converted depth to bottom to age to bottom.
They took an Age Model and exposing data tagged with that age - they need to MAKE SURE it is a derivative product that they may or may not agree wiht.
author of age model, (Neptune database)
They have been looking at SKOS b/c friendlier to them, but don't want to preclude it. But they're not ready to adopt reasoning.
If they say data from an age - they know there was an age model and would want to see the age model.
They have given age models URIs and are doing 303s with conneg.
Users might want to make their own age model.
extracting from relational databases into a Virtuoso endpoint.
NGDC creating skos vocabularies
National Geophysical Data Center is their long term archive.
R2R at Lamar has vocabularies.
want to build from or at least cross link to their vocabs, too.
Doug has the flexibility to explore the right way to expose their data using linked data principles.


Linking Open Government Data
Doug fed their data into Exhibit UI
S2S both for escience and LOGD's international dataset


from the data, get to the publications (and vice versa)
take leg, site, hole - use key into the graph
(location, water depth, resources)
from a URI to a bucket of related URIs for related things.


Elsevier Strategic Parter
SciVerse hub
open social based portal
they implemented open social api that lets third party developers to create applets to add to sciverse to enhance sciverse user experience.
as user is looking for literature in sciverse, the TWC app searches and LOGD datasets to include in results.
result: relevant data sets to their literature search (our data in their environment)
e.g. "ozone"
find data related to an article, recommendations based on topic and it's data.
Doug: vocabulary like age slice, bio taxon, fossils are in a section -- become facets through the data.
example: took a vocab of first occurrence and last occurrence (time zones: Jurrassic) - resolved them against URI in freebase.
John: DOIs, crossref is making them conneg savy (article metadata). DOIs have always provided "multiple resolution" functionality to registered items related to article of interest. New possibiities include links to data that accompanies the research in the paper, the vocab used in the datasets.
Li: unique identifiers.
Doug has list of 10 driving use cases.
discoverability based on concepts.
right now, they have a web form that requires knowledge of data.
terms are "close to data, not close to science."
come in with time stage "jurassic"
what samples do you have with that age period?
"I want all Jurassic sandy silt?"
available sources are known,
3 measures: percentages of sand, gravel, and silt.
define "sandy silt" to be percentage ragnes of each.
they store the actual value
Multiple people can store different interpretations (e.g. not 20%, it's 15% silt)
(This is OWL 2, no?)
Doug: looking to define vocab to enable discovery in the data.
Want to just get something out to get feedback.
they're already writing things, want to make sure they are writing it correctly.
Peter: Doug is in charge of the data and wants advice on directions on what to do and what not to do.
From application to source. We should tell him what to do.


URI naming design

Tim's approach is to name by scope, THEN by prettiness, and declaratively link the two.
backward compatibility! dumping their graphs! (the not(Tim) approach ;) )
The URI design is a function of the use cases, it's qualitative.
linking publications
lomont vs. A&M