Archive

Archive for the ‘tetherless world’ Category

Get comments on the Water Quality Portal from AGU 2011

February 1st, 2012

AGU 2011 was the 1st conference I attended after I came to RPI. There were many interesting activities in the conference and I feel this is a very rewarding experience. I had the poster for the semantic water quality portal and also helped a little bit with the RPI table in the academic section (by just being there). I went to several talks and visited some exhibit booths. We had a group lunch at Chevy’s, which was very fun!

Preparing the poster helped me to rethink the water quality portal. Thank Evan for his poster for ISWC 2011, which was a very good starting point for my AGU poster! Having the poster session was even more fun and rewarding! I presented the portal to researchers from various fields and countries. Most of the people I talked to said that the portal is a nice and interesting project. Some researchers gave me very helpful comments like:
1. bring in crowd sourcing, e.g. let users report problem
2. help farmers to identify polluted wells
3. we should have an approach for pulling new data from USGS and EPA, e.g. some subscription
4. regulation management for users (insert/upload/delete)
5. consider allergic as use cases, possible conditions for allergic alert: wind + time, a combination of pollutants

I went to several talks during AGU and got to know the cool projects that researchers from different organizations (EPA, Standford, UMD, Google, NASA) have been doing. It was impressive to see that how computer science has been widely and deeply used in geophysical research. And I felt that scientists from geophysical fields expect more cooperation with people from computer science.

I went to the exhibits twice and spent quite some time there. I used the wired network provided by Google to do my assignments for the AI course. I also listened the talk about Google earth engine, a very cool platform for geophysical scientists!

Attending a conference as huge as AGU indeed requires some energy but after all this is worthwhile.

Tips about travel reimbursement that Carol gave me today:

1. Keep boarding pass to show if you sit in economic class
2. check out at the hotel and get the folio to show that you actually
stayed at the hotel for how many nights
3. get itemized receipts at restaurants

Thank Carol!

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

My First AGU Experience

January 30th, 2012

(This post was supposed to be posted a month ago. But I had some trouble accessing the TW weblog website when I was in China, so I have to post it now after I came back to Troy.)

AGU 2011 Fall Meeting was the first time I went to an academic conference. I was very excited when I learned I’ve got such an opportunity. My goals were to present our poster, to check out what it is like in such a conference, and to have an idea about what other people are doing in the Informatics area.

My poster was about the work with Eric Rozell on the temporal metadata modeling in VSTO. I presented its motivation and methodology to several people, and it certainly drew some interest. Our approach has been viewed as an effective way to deal with a large amount of data and to improve reasoning and searching capacities. It was suggested that a similar technique (in the sense of including the temporal range for a dataset to a granularity of days using time:DateTimeInterval) has been used for data indexing in relational databases in NASA. In terms of the presentation, I think putting our posters, publications, and demos into flash drives and distributing them to people was a very good idea. It greatly helped the interested audiences to understand our work more afterwards.

There were many other interesting work across a couple of sessions. For example, Nicholas Del Rios etc. from University of Texas at El Paso presented a semantic and provenance aware visualization framework (VisKo) that links data with visualization processes. It has been used to visualize data on behalf of Giovanni. It is able to capture data processing provenance and visualization provenance in PML. Besides posters, I also went several talks from different sessions. Though I failed to connect most of them to my research work, I thought it was nice to hear about what other people have been working on.

Another output for me was to meet people in the Earth Science and Informatics areas. Although the names I could remember were limited, what I saw was they are a group of people who show enthusiasm about their work. They believe in what they are doing and have the confidence in the accomplishment their work will bring. I really look forward to working with many of them.

To sum up, this was a great experience for me in the beginning stage of my Ph.D. career.  Next time I will try to meet and talk to more people, and get more feedback about my own work.

 

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

Greg

January 14th, 2012

It was with great sadness that I learned yesterday of the passing of my friend and colleague, Greg Leptoukh.  Greg was a physical scientist at the NASA Goddard Space Flight Center and a coordinator of the Earth Science Information Partners (ESIP) Federation Information Quality cluster.  Greg was dedicated to leveraging information systems to improve the usability of data for scientists; reducing technical barriers for data use and improving user comprehension of data generation and use.

I had the pleasure to work with Greg on the Multi-Sensor Data Synergy Advisor (MDSA) a prototype semantic extension to the already successful Giovanni online data anaylsis tool.  Giovanni has proven to be a successful tool for reducing the technical barriers in science data processing, analysis, and visualization and information provided through Giovanni has played a role in over 400 science publications to date.  With MDSA, Greg intended to show how Giovanni could be instrumented to provide provenance, quality, and expert knowledge about data to interested users.  Greg was extremely enthusiastic about the potential of semantic technologies to power these enhancements; ontologies to describe concepts important to data generation and use and rules to expose and explain scenarios that may lead to misunderstood analysis results.  I will always admire Greg’s enthusiasm for what we had been able to accomplish, and what we would be able to accomplish in future projects.  Greg clearly saw what were were doing as a means to empower scientists, a noble goal if ever there was.

I am thankful for having had the opportunity to have worked with Greg, and incredibly sad he was not able to see the fruition of this work.

You will be missed friend.

VN:F [1.9.13_1145]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

Characterizing quality for science data products

December 30th, 2011

Characterizing quality for a science data product is hard. We have been working on this issue in our Multi-Sensor Data Synergy Advisor (MDSA) project with Greg Leptoukh and Chris Lynnes from the NASA Goddard Space Flight Center (GSFC). The following is my opinion on what product quality means and how it can be characterized. This work was presented as a poster at the AGU FM 2011 meeting.

Science product quality is hard to define, characterize, and act upon. Product quality reflects a comparison against standard products of a similiar kind, but it is also reflective of the fitness-for-use of the product for the end-user. Users weigh quality characteristics (e.g. accuracy, completeness, coverage, consistency, representativeness) based on their intended use for the data, and therefore quality of a product can be different based on different users’ needs and interests.  Despite the subjective nature of quality assertions, and their sensitivity to users fitness-for-use, most quality information is provided by the product producer and the subjective criteria used to determine quality is opaque, if available at all.

If users are given product quality information at all, this information usually comes in one of two forms:

  • tech reports where extensive statistical analysis is reported on very specific characteristics of the product
  • in the form of subjective and unexplained statements such as ‘good’, ‘marginal’, ‘bad’.

This is either information overload that is not easy for the user to quickly assess or a near lack of the type of information that a user needs to make their own subjective quality assessment.

Is there a smilar scenario in common-day life where users are presented with quality information that they can readily understand and act upon?

There is, and you see it every day in the supermarket.

a common application of information used to make subjective quality assessments

Nutrition Facts labels provide nutrition per serving information (e.g. amount of Total Fat, Total Carbohydrates, Protein) and how the the listed amounts per serving compare to a perspective daily diet.

The comparison to a standard 2,000 calorie diet provides the user with a simple assessment tool for the usefulness of food item in their unique diet. Quality assertions, such as whether this food is ‘good’, or ‘bad’ for the consumer’s diet are left to the consumer – but are relatively easy to make with the available information.

A ‘quality facts’ label for a scientific data product, showing computed values for community-recognized quality indicators, would go a long way towards enabling a nutrition label-like presentation of quality that is easy for science users to consume and act upon.

an early mockup of a presentation of quality information for a science data product

We have begun working on mockups of what such a presentation of quality could look like, and have constructed a basic quality model that would allow us to express in RDF the information that would be used to construct a quality facts label.

Our quality model primer presents our high-level quality model and its application to an aerosol satellite data product in detail.

Our poster presentation was a hit at AGU, where we received a great deal of positive feedback on it.  This nutrition label-like presentation is immediately familiar, and supports the metaphor of science users ‘shopping’ for the best data product to fit their needs.

We still have a long way to go on developing our presentation, but the feedback from discussions at AGU tells me that our message resonated with our intended audience.

VN:F [1.9.13_1145]
Rating: 8.4/10 (5 votes cast)
VN:F [1.9.13_1145]
Rating: +5 (from 5 votes)

S2S Feedback at AGU Fall Meeting 2011

December 19th, 2011

The AGU Fall Meeting 2011 was a busy meeting and, as usual, the Tetherless World Constellation (TWC) received quite a bit of attention in terms of best practices and tool support for Semantic eScience. I gave two poster presentations during the Semantic, Linked Data, and Drupal-based Solutions for Science (IN31B) poster session. I had one poster in IN31B about creating linked data for AGU abstracts. The second poster was in IN31A, a session about the Real Use of Open Standards and Technologies, however it became apparent that I was more interested in talking about it as an IN31B poster. It was a poster on S2S, and there was a range of feedback, which I discuss in this blog, including enthusiasts who wanted to implement it, skeptics who felt it was not an “interoperable” solution, and faceted browse developers who wanted to know why S2S needed so much complexity.

Addressing the first type of feedback is not difficult. I want everyone to be able to deploy an S2S interface for their data. However, I often have to hold myself back, because I know that the software is not to a point that it can be easily reused without a significant amount of hand-holding on my part. The basic problems are documentation and complexity of installation. While the documentation problem can be easily fixed, the problem of installation will remain until the S2S back-end architecture is updated. The back-end architecture depends on a triplestore deployed on one of TWC’s machines for indexing metadata about S2S services. I plan to move the back-end to a linked data crawler approach next spring, removing the dependencies on TWC triplestores and enabling wider installation.

The second type of feedback was more interesting to address. It’s always good to hear constructive criticism about a project. The argument was, because S2S uses its own vocabulary to describe, i.a., Web services, “widgets”, and parameters, it is not interoperable because existing tools will not understand those vocabularies. I have two primary defenses to that. The first is that S2S allows you to define virtually any term so that they can be used by old tools and new tools. For instance, S2S allows you to define each of the OpenSearch vocabulary terms including “results”, “searchTerms”, “startIndex”, and “count”. Each of these have in fact been implemented by our OpenSearch services for S2S, so when a traditional OpenSearch tool finds an S2S OpenSearch service, it should still be able to use it. The second defense is, if you do not agree with the S2S vocabulary, find a vocabulary with as much tool support as S2S for developing faceted browse or advanced search interfaces. At the time the S2S project started, we found no vocabularies for defining the “extensibility” aspects of OpenSearch (i.e., the fact that URIs can be used in place of any of the OpenSearch terms). So we did define those vocabularies, and we specifically designed them for S2S’s purpose. I’d be happy to collaborate with anyone who has a broader or different purpose from S2S to extend the vocabulary to their needs, or map S2S terms to their terms.

The last type of feedback was why the S2S framework has so much complexity. I’m not sure there is one good response to that inquiry, I think the complexity is useful when you look at the big picture for S2S. For one, S2S was never explicitly designed to be a framework for faceted browsing interfaces. Rather, it was designed to develop configurable user interfaces, with a heavy emphasis on reusability of user interface components. Faceted browsing became the focus because we had two use cases that were best implemented with faceted browse. Another complexity issue was in the number of queries made by an S2S faceted browser compared to something like Apache Solr. For instance, a browser with 6 facets could potentially require 7 queries to populate the browser with data in S2S (1 per facet plus 1 for the results). In Solr, there is a single query that can return all facets and facet values. The design decision in S2S was that a data manager may need to query a remote source to determine what its facet values are. Alternatively, the data manager may have a single input that it does not wish to facet (say, for performance reasons). In either case, we designed S2S to be as flexible as possible, which in some cases means it takes a little more effort to set up when compared to something more rigid, such as Apache Solr.

VN:F [1.9.13_1145]
Rating: 9.0/10 (2 votes cast)
VN:F [1.9.13_1145]
Rating: +1 (from 1 vote)
Author: Categories: tetherless world Tags: