S2S Feedback at AGU Fall Meeting 2011
The AGU Fall Meeting 2011 was a busy meeting and, as usual, the Tetherless World Constellation (TWC) received quite a bit of attention in terms of best practices and tool support for Semantic eScience. I gave two poster presentations during the Semantic, Linked Data, and Drupal-based Solutions for Science (IN31B) poster session. I had one poster in IN31B about creating linked data for AGU abstracts. The second poster was in IN31A, a session about the Real Use of Open Standards and Technologies, however it became apparent that I was more interested in talking about it as an IN31B poster. It was a poster on S2S, and there was a range of feedback, which I discuss in this blog, including enthusiasts who wanted to implement it, skeptics who felt it was not an “interoperable” solution, and faceted browse developers who wanted to know why S2S needed so much complexity.
Addressing the first type of feedback is not difficult. I want everyone to be able to deploy an S2S interface for their data. However, I often have to hold myself back, because I know that the software is not to a point that it can be easily reused without a significant amount of hand-holding on my part. The basic problems are documentation and complexity of installation. While the documentation problem can be easily fixed, the problem of installation will remain until the S2S back-end architecture is updated. The back-end architecture depends on a triplestore deployed on one of TWC’s machines for indexing metadata about S2S services. I plan to move the back-end to a linked data crawler approach next spring, removing the dependencies on TWC triplestores and enabling wider installation.
The second type of feedback was more interesting to address. It’s always good to hear constructive criticism about a project. The argument was, because S2S uses its own vocabulary to describe, i.a., Web services, “widgets”, and parameters, it is not interoperable because existing tools will not understand those vocabularies. I have two primary defenses to that. The first is that S2S allows you to define virtually any term so that they can be used by old tools and new tools. For instance, S2S allows you to define each of the OpenSearch vocabulary terms including “results”, “searchTerms”, “startIndex”, and “count”. Each of these have in fact been implemented by our OpenSearch services for S2S, so when a traditional OpenSearch tool finds an S2S OpenSearch service, it should still be able to use it. The second defense is, if you do not agree with the S2S vocabulary, find a vocabulary with as much tool support as S2S for developing faceted browse or advanced search interfaces. At the time the S2S project started, we found no vocabularies for defining the “extensibility” aspects of OpenSearch (i.e., the fact that URIs can be used in place of any of the OpenSearch terms). So we did define those vocabularies, and we specifically designed them for S2S’s purpose. I’d be happy to collaborate with anyone who has a broader or different purpose from S2S to extend the vocabulary to their needs, or map S2S terms to their terms.
The last type of feedback was why the S2S framework has so much complexity. I’m not sure there is one good response to that inquiry, I think the complexity is useful when you look at the big picture for S2S. For one, S2S was never explicitly designed to be a framework for faceted browsing interfaces. Rather, it was designed to develop configurable user interfaces, with a heavy emphasis on reusability of user interface components. Faceted browsing became the focus because we had two use cases that were best implemented with faceted browse. Another complexity issue was in the number of queries made by an S2S faceted browser compared to something like Apache Solr. For instance, a browser with 6 facets could potentially require 7 queries to populate the browser with data in S2S (1 per facet plus 1 for the results). In Solr, there is a single query that can return all facets and facet values. The design decision in S2S was that a data manager may need to query a remote source to determine what its facet values are. Alternatively, the data manager may have a single input that it does not wish to facet (say, for performance reasons). In either case, we designed S2S to be as flexible as possible, which in some cases means it takes a little more effort to set up when compared to something more rigid, such as Apache Solr.