Author Archive

DCO-DS participation at Research Data Alliance Plenary 5 meeting

April 30th, 2015

In early March I attended the Research Data Alliance Fifth Plenary and “Adoption Day” event to present our plans for adopting DataTypes and Persistent Identifier Types in the DCO Data Portal. This was the first plenary following the publishing of the data type and persistent identifer type outputs and the RDA community was interested in seeing how early adopters were faring.

At the Adoption Day event I gave a short presentation on our plan for representing DataTypes in the DCO Data Portal knowledge base. Most of the other adopter presentations were limited to organizational requirements or high-level architecture around data types or persistent identifiers – our presentation stood out because we presented details on ‘how’ we intended to implement RDA outputs rather than just ‘why’. I think our attention on technical details was appreciated; from listening to the presentations it did not sound like many other groups were very far into their adoption process.

My main takeaways from the conference were the following:
– we are ahead of the curve on adopting the RDA data type and persistent identifier outputs
– we are viewed as leaders on how to implement data types; people are paying attention to what we are doing
– the chair of the DataType WG was very happy that we were thinking of how data types made sense within the context of our existing infrastructure rather than looking to the WGs reference implementation as the sole way to implement the output
– the DataType WG reference repository is more proof-of-concept then production system
– The data type community is interested in the topic of federating repositories but is not ready to do much on that yet

Overall I think we are well positioned to be a leader on data types. Our work to-date was very well received and many members involved in the DataType WG will be very interested in what more we have to show next September at the Sixth Plenary.

Good work team and let’s keep up the good work!

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: Blog, tetherless world Tags: ,

Geodata 2014

June 30th, 2014

A few weeks ago I attended the 2014 Geodata Workshop. Like the previous Geodata workshop in 2011, this workshop was focused on discussing policies and techniques to improve inter-agency geographic data integration and data citation. While there have been advances in recommendations for data citation and geodata integration since the last Geodata workshop, I felt the mood of the attendees indicated that we are now in much the same place we were in 2011. There was strong consensus as to the importance of data citation and integration, but a feeling that no one is really doing it at scale, the tools aren’t where we need them to be, and the agency policies are not yet at a state to successfully drive widespread adoption. Despite these hurdles this is a community that is clearly excited and willing to take the first steps towards making widespread data integration and data citation a reality in the geodata community.

Meanwhile, in the trenches…

I had several conversations with attendees who represent publishers of oceanographic vocabularies. Many of these vocabularies have been publicly available for several years, but have been traditionally been 3-star open data (publicly available in a non-proprietary machine-readable format, no links to external vocabularies). These publishers are excited about upgrading their vocabulary services to be 5-star open data (use open W3C standards such as RDF/SPARQL, identify things with resolvable URIs, link to other people’s data) because they see a major benefit in being able refer to the authoritative source for a term or identified resource that is related to their vocabulary but for which they are not the authoritative source. This is a great example of a group that has already identified a specific real-world need and benefit from integration and who are actively laying the groundwork that will enable that integration to be successful. This group was enthusiastic about cross-linking their vocabluaries and I have no doubt their efforts will be viewed as a data integration success at the next Geodata workshop.

Where we can help…

As a result of these discussions our lab is starting a Linked Vocabulary API effort whose goal is to provide a Linked Data API configuration specialized to the purpose of publishing SKOS vocabularies. Our goal is to develop a configuration that makes bootstraping a RESTful linked data API to a SKOS vocabulary simple and accessible for the broad scientific community.  This effort is based on work we previously did for the CMSPV project.

In conclusion

What I will remember most from Geodata 2014 is the excitment members of the community had towards adopting new technologies and techniques and making widespread data integration and citation a reality. Where conventions have yet to be established the community is willing to take the first steps and establish best practices.  Where policies have yet to be formalized the community is ready to work with policy makers to ensure clear and helpful policies are established .  Whenever the next Geodata workshop is held, I am confident that it’s narrative will be full of success stories that began at the 2014 workshop.

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)
Author: Categories: tetherless world Tags:


January 14th, 2012

It was with great sadness that I learned yesterday of the passing of my friend and colleague, Greg Leptoukh.  Greg was a physical scientist at the NASA Goddard Space Flight Center and a coordinator of the Earth Science Information Partners (ESIP) Federation Information Quality cluster.  Greg was dedicated to leveraging information systems to improve the usability of data for scientists; reducing technical barriers for data use and improving user comprehension of data generation and use.

I had the pleasure to work with Greg on the Multi-Sensor Data Synergy Advisor (MDSA) a prototype semantic extension to the already successful Giovanni online data anaylsis tool.  Giovanni has proven to be a successful tool for reducing the technical barriers in science data processing, analysis, and visualization and information provided through Giovanni has played a role in over 400 science publications to date.  With MDSA, Greg intended to show how Giovanni could be instrumented to provide provenance, quality, and expert knowledge about data to interested users.  Greg was extremely enthusiastic about the potential of semantic technologies to power these enhancements; ontologies to describe concepts important to data generation and use and rules to expose and explain scenarios that may lead to misunderstood analysis results.  I will always admire Greg’s enthusiasm for what we had been able to accomplish, and what we would be able to accomplish in future projects.  Greg clearly saw what were were doing as a means to empower scientists, a noble goal if ever there was.

I am thankful for having had the opportunity to have worked with Greg, and incredibly sad he was not able to see the fruition of this work.

You will be missed friend.

VN:F [1.9.22_1171]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

Characterizing quality for science data products

December 30th, 2011

Characterizing quality for a science data product is hard. We have been working on this issue in our Multi-Sensor Data Synergy Advisor (MDSA) project with Greg Leptoukh and Chris Lynnes from the NASA Goddard Space Flight Center (GSFC). The following is my opinion on what product quality means and how it can be characterized. This work was presented as a poster at the AGU FM 2011 meeting.

Science product quality is hard to define, characterize, and act upon. Product quality reflects a comparison against standard products of a similiar kind, but it is also reflective of the fitness-for-use of the product for the end-user. Users weigh quality characteristics (e.g. accuracy, completeness, coverage, consistency, representativeness) based on their intended use for the data, and therefore quality of a product can be different based on different users’ needs and interests.  Despite the subjective nature of quality assertions, and their sensitivity to users fitness-for-use, most quality information is provided by the product producer and the subjective criteria used to determine quality is opaque, if available at all.

If users are given product quality information at all, this information usually comes in one of two forms:

  • tech reports where extensive statistical analysis is reported on very specific characteristics of the product
  • in the form of subjective and unexplained statements such as ‘good’, ‘marginal’, ‘bad’.

This is either information overload that is not easy for the user to quickly assess or a near lack of the type of information that a user needs to make their own subjective quality assessment.

Is there a smilar scenario in common-day life where users are presented with quality information that they can readily understand and act upon?

There is, and you see it every day in the supermarket.

a common application of information used to make subjective quality assessments

Nutrition Facts labels provide nutrition per serving information (e.g. amount of Total Fat, Total Carbohydrates, Protein) and how the the listed amounts per serving compare to a perspective daily diet.

The comparison to a standard 2,000 calorie diet provides the user with a simple assessment tool for the usefulness of food item in their unique diet. Quality assertions, such as whether this food is ‘good’, or ‘bad’ for the consumer’s diet are left to the consumer – but are relatively easy to make with the available information.

A ‘quality facts’ label for a scientific data product, showing computed values for community-recognized quality indicators, would go a long way towards enabling a nutrition label-like presentation of quality that is easy for science users to consume and act upon.

an early mockup of a presentation of quality information for a science data product

We have begun working on mockups of what such a presentation of quality could look like, and have constructed a basic quality model that would allow us to express in RDF the information that would be used to construct a quality facts label.

Our quality model primer presents our high-level quality model and its application to an aerosol satellite data product in detail.

Our poster presentation was a hit at AGU, where we received a great deal of positive feedback on it.  This nutrition label-like presentation is immediately familiar, and supports the metaphor of science users ‘shopping’ for the best data product to fit their needs.

We still have a long way to go on developing our presentation, but the feedback from discussions at AGU tells me that our message resonated with our intended audience.

VN:F [1.9.22_1171]
Rating: 7.1/10 (10 votes cast)
VN:F [1.9.22_1171]
Rating: +5 (from 5 votes)

Thoughts on EGU Earth and Space Science Informatics

April 12th, 2011

The European Geosciences Union (EGU) General Assembly 2011, held from April 22-27 in Vienna, Austria, brought together 10,725 researchers from approximately 96 countries to discuss advances and trends within the Earth, Planetary, and Space Sciences.  Attendees presented over four thousand talks and nearly eight and a half thousand posters during the week-long conference.  The volume of information presented and ideas exchanged at EGU is truly staggering, and this meeting is a fundamental part of the social networking for this community.  It is the seed from which many new collaborations form and many grant proposals can trace their genesis to discussions at and around EGU sessions.  EGU is a loud, busy, hectic, and critical event to the Earth and Space Science research communities.

This year I had the pleasure of attending the EGU General Assembly and presenting two talks for our lab in the Earth and Space Science Informatics (ESSI) disciplinary session.  This trip was particularly interesting for me as it was the first time I have attended the EGU General Assembly (I have attended the ESSI session at the American Geophysical Union Fall Meeting for the last four years) and this afforded me the opportunity to talk with a large number of colleagues in the field from Eurasia whom I have not had previous interaction with.

Over the course of the meeting I came to realize that the European informatics community has a slightly different feel, a different focus from the American informatics groups.  For the European informatics evangelists, the driving focus in informatics is geoscience standards.  The session discourse revolved almost entirely on interoperability in systems and in data models.  Presentations on service-orientated architectures (SOA) that interoperate using Open Geospatial Consortium (OGC) defined web services and data models were the rule of the day.  I believe this focus on interoperability and developing consensus on standards and services is deeply rooted in the culture of Europe and the inherent complexities of interacting with neighbors that are culturally and linguistically different.

The presentations I gave for the lab were quite well received.  Eric’s presentation on the S2S architecture was extremely well received because of its potential to interoperate with OGC web services, and its compatibility with many of the SOA initiatives currently being pursued by our European colleagues.  My presentation on the Multi-Sensor Data Synergy Advisor (MDSA), one of our collaborations Greg Leptoukh’s group at the NASA Goddard Space Flight Center, was also well received as it touched on the complexities of representing/characterizing and comparing concepts such as product quality and fitness-for-use and expressing this information in a way that is consumable by a researcher.  Difficulties in characterizing these concepts were well known to many in the audience who work to develop data model standards. Our work elicited particular interest from David Arctur, Director of Interoperability Programs for OGC, and this is a relationship we should most definitely pursue.

Overall, attending EGU 2011 was a great experience and I believe our lab should increase our presence at future meetings.  There is a great deal we can learn from our European colleagues on building consensus and establishing interoperable web service and data model standards.  The ESSI community is investing great effort in deploying production-level interagency SOA systems – developing experience from which we can benefit.  What our lab can offer is leadership in future web technologies.  We can help the EGU ESSI community adopt linked data / RESTful principles, and add support to their data model and service standards and best practices for semantic web standards and methodologies.  We can provide guidance on how semantic technologies can help further their current goals as well as how they can leverage semantic technologies going forward.

There is a great deal of benefit ESSI can gain from moving to leverage the Semantic Web.  We should be there to share our experiences and provide guidance.  A win for the Earth Sciences is a win for all.

VN:F [1.9.22_1171]
Rating: 9.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +2 (from 2 votes)
Author: Categories: Data Science Tags: , ,