Archive

Author Archive

Greg

January 14th, 2012

It was with great sadness that I learned yesterday of the passing of my friend and colleague, Greg Leptoukh.  Greg was a physical scientist at the NASA Goddard Space Flight Center and a coordinator of the Earth Science Information Partners (ESIP) Federation Information Quality cluster.  Greg was dedicated to leveraging information systems to improve the usability of data for scientists; reducing technical barriers for data use and improving user comprehension of data generation and use.

I had the pleasure to work with Greg on the Multi-Sensor Data Synergy Advisor (MDSA) a prototype semantic extension to the already successful Giovanni online data anaylsis tool.  Giovanni has proven to be a successful tool for reducing the technical barriers in science data processing, analysis, and visualization and information provided through Giovanni has played a role in over 400 science publications to date.  With MDSA, Greg intended to show how Giovanni could be instrumented to provide provenance, quality, and expert knowledge about data to interested users.  Greg was extremely enthusiastic about the potential of semantic technologies to power these enhancements; ontologies to describe concepts important to data generation and use and rules to expose and explain scenarios that may lead to misunderstood analysis results.  I will always admire Greg’s enthusiasm for what we had been able to accomplish, and what we would be able to accomplish in future projects.  Greg clearly saw what were were doing as a means to empower scientists, a noble goal if ever there was.

I am thankful for having had the opportunity to have worked with Greg, and incredibly sad he was not able to see the fruition of this work.

You will be missed friend.

VN:F [1.9.14_1148]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.14_1148]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

Characterizing quality for science data products

December 30th, 2011

Characterizing quality for a science data product is hard. We have been working on this issue in our Multi-Sensor Data Synergy Advisor (MDSA) project with Greg Leptoukh and Chris Lynnes from the NASA Goddard Space Flight Center (GSFC). The following is my opinion on what product quality means and how it can be characterized. This work was presented as a poster at the AGU FM 2011 meeting.

Science product quality is hard to define, characterize, and act upon. Product quality reflects a comparison against standard products of a similiar kind, but it is also reflective of the fitness-for-use of the product for the end-user. Users weigh quality characteristics (e.g. accuracy, completeness, coverage, consistency, representativeness) based on their intended use for the data, and therefore quality of a product can be different based on different users’ needs and interests.  Despite the subjective nature of quality assertions, and their sensitivity to users fitness-for-use, most quality information is provided by the product producer and the subjective criteria used to determine quality is opaque, if available at all.

If users are given product quality information at all, this information usually comes in one of two forms:

  • tech reports where extensive statistical analysis is reported on very specific characteristics of the product
  • in the form of subjective and unexplained statements such as ‘good’, ‘marginal’, ‘bad’.

This is either information overload that is not easy for the user to quickly assess or a near lack of the type of information that a user needs to make their own subjective quality assessment.

Is there a smilar scenario in common-day life where users are presented with quality information that they can readily understand and act upon?

There is, and you see it every day in the supermarket.

a common application of information used to make subjective quality assessments

Nutrition Facts labels provide nutrition per serving information (e.g. amount of Total Fat, Total Carbohydrates, Protein) and how the the listed amounts per serving compare to a perspective daily diet.

The comparison to a standard 2,000 calorie diet provides the user with a simple assessment tool for the usefulness of food item in their unique diet. Quality assertions, such as whether this food is ‘good’, or ‘bad’ for the consumer’s diet are left to the consumer – but are relatively easy to make with the available information.

A ‘quality facts’ label for a scientific data product, showing computed values for community-recognized quality indicators, would go a long way towards enabling a nutrition label-like presentation of quality that is easy for science users to consume and act upon.

an early mockup of a presentation of quality information for a science data product

We have begun working on mockups of what such a presentation of quality could look like, and have constructed a basic quality model that would allow us to express in RDF the information that would be used to construct a quality facts label.

Our quality model primer presents our high-level quality model and its application to an aerosol satellite data product in detail.

Our poster presentation was a hit at AGU, where we received a great deal of positive feedback on it.  This nutrition label-like presentation is immediately familiar, and supports the metaphor of science users ‘shopping’ for the best data product to fit their needs.

We still have a long way to go on developing our presentation, but the feedback from discussions at AGU tells me that our message resonated with our intended audience.

VN:F [1.9.14_1148]
Rating: 8.4/10 (5 votes cast)
VN:F [1.9.14_1148]
Rating: +5 (from 5 votes)

Thoughts on EGU Earth and Space Science Informatics

April 12th, 2011

The European Geosciences Union (EGU) General Assembly 2011, held from April 22-27 in Vienna, Austria, brought together 10,725 researchers from approximately 96 countries to discuss advances and trends within the Earth, Planetary, and Space Sciences.  Attendees presented over four thousand talks and nearly eight and a half thousand posters during the week-long conference.  The volume of information presented and ideas exchanged at EGU is truly staggering, and this meeting is a fundamental part of the social networking for this community.  It is the seed from which many new collaborations form and many grant proposals can trace their genesis to discussions at and around EGU sessions.  EGU is a loud, busy, hectic, and critical event to the Earth and Space Science research communities.

This year I had the pleasure of attending the EGU General Assembly and presenting two talks for our lab in the Earth and Space Science Informatics (ESSI) disciplinary session.  This trip was particularly interesting for me as it was the first time I have attended the EGU General Assembly (I have attended the ESSI session at the American Geophysical Union Fall Meeting for the last four years) and this afforded me the opportunity to talk with a large number of colleagues in the field from Eurasia whom I have not had previous interaction with.

Over the course of the meeting I came to realize that the European informatics community has a slightly different feel, a different focus from the American informatics groups.  For the European informatics evangelists, the driving focus in informatics is geoscience standards.  The session discourse revolved almost entirely on interoperability in systems and in data models.  Presentations on service-orientated architectures (SOA) that interoperate using Open Geospatial Consortium (OGC) defined web services and data models were the rule of the day.  I believe this focus on interoperability and developing consensus on standards and services is deeply rooted in the culture of Europe and the inherent complexities of interacting with neighbors that are culturally and linguistically different.

The presentations I gave for the lab were quite well received.  Eric’s presentation on the S2S architecture was extremely well received because of its potential to interoperate with OGC web services, and its compatibility with many of the SOA initiatives currently being pursued by our European colleagues.  My presentation on the Multi-Sensor Data Synergy Advisor (MDSA), one of our collaborations Greg Leptoukh’s group at the NASA Goddard Space Flight Center, was also well received as it touched on the complexities of representing/characterizing and comparing concepts such as product quality and fitness-for-use and expressing this information in a way that is consumable by a researcher.  Difficulties in characterizing these concepts were well known to many in the audience who work to develop data model standards. Our work elicited particular interest from David Arctur, Director of Interoperability Programs for OGC, and this is a relationship we should most definitely pursue.

Overall, attending EGU 2011 was a great experience and I believe our lab should increase our presence at future meetings.  There is a great deal we can learn from our European colleagues on building consensus and establishing interoperable web service and data model standards.  The ESSI community is investing great effort in deploying production-level interagency SOA systems – developing experience from which we can benefit.  What our lab can offer is leadership in future web technologies.  We can help the EGU ESSI community adopt linked data / RESTful principles, and add support to their data model and service standards and best practices for semantic web standards and methodologies.  We can provide guidance on how semantic technologies can help further their current goals as well as how they can leverage semantic technologies going forward.

There is a great deal of benefit ESSI can gain from moving to leverage the Semantic Web.  We should be there to share our experiences and provide guidance.  A win for the Earth Sciences is a win for all.

VN:F [1.9.14_1148]
Rating: 9.0/10 (1 vote cast)
VN:F [1.9.14_1148]
Rating: +2 (from 2 votes)
Author: Categories: Data Science Tags: , ,