IOGDC Presentation @ I-Semantics 2011, Triplification Challenge

September 26th, 2011

The I-Semantics 2011 Conference, co-located with I-KNOW, was held Sept. 7 – 9, 2011 in Graz, Austria. The conference covered a range of topics, including Web-scale recommendation systems, information visualization, semantic content engineering, Web science, social Web, SemWeb applications, and the list goes on. Given the broad scope of the conference, I decided to target the talks that were most compatible with our research agendas at TWC. Namely, I looked for work that was related to Linked Open Data, work that was applicable to our Semantic eScience Framework project, and some natural language and machine learning work that I am personally interested in.

Linked Data was a major theme of I-Semantics.  I saw an interesting talk on a RESTful architecture for both reading and writing Linked Data.  The architecture placed some restrictions on how the data could be structured and queried, defining ontological concepts of “records” and “layers” used to annotate the data, which when aggregated together, essentially form named graphs.  They also make an interesting use of the HTTP range header to retrieve partial records.  Another interesting presentation was a vocabulary for creating linked data versions of call for papers for scientific publications.  I think we should look into this vocabulary at TWC, particularly for our website, as it may be a good solution to keeping people up to date on relevant submission deadlines for publication.   Much of the other work on Linked Data was in the Triplification Challenge session.  We presented our own work on the International Open Government Data Catalog (now IOGD Search, or IOGDS) at this session, which I discuss in the final paragraph.  Other submissions included a “trip planner”, which used both LOD resources as well as the Open Provenance Model to annotate tourism-related information on the Web; also, there was an interesting application for annotating and performing semantic search over annotations of online media.  Our primary competitor in the Triplification Challenge (for the Open Government Track) was the work on Open Data Albania.  The authors gave a demo of their website, which was based on CKAN.  I found the most interesting part of the presentation was that they automatically convert datasets published in their catalog into Google Data Tables, which are compatible with various Google Viz tools, such as the Google MotionChart.

There were a number of interesting presentations that covered research of interest in the Semantic eScience Framework project and I discuss two of them here.  The first was a presentation that I saw on a knowledge federation framework for biomedical applications.  The interesting part about this framework was the parallels in the design of this knowledge federation framework with the design of S2S.  The framework, called Coeus, uses a “connector” (referred to as an “adapter” in S2S) to attach data sources in various formats (i.e., CSV, XML, RDB, RDF) to the framework.  It then simplifies the application development process by reducing the effort required to aggregate multiple “connected” resources.  The other interesting research that I saw was in the poster session on Thursday afternoon.  One of the posters was on ontology modularization, and the authors had an interesting view of the structure of modular ontologies.  In the past, we have investigated “three-layer” modularization architectures, such as this one, for VSTO and SeSF.  This work was a variation on the “three-layer” architecture, where the layers were not separated by levels of expressivity, but levels of abstraction.  The purpose of the more abstract ontologies was to provide a frame for domain experts to rapidly/easily build their applications off of.  I have been in contact with the authors and they will be interested to hear if we apply this architecture in our SeSF ontology development. They will also be presenting on this work at ISWC 2011.

The best paper award for the conference went to Pablo Mendes and the DBPedia Spotlight team.  I was very interested in this work because I am working on a project for the Federation of Earth Science Informatics Partners that extracts entities from American Geophysical Union abstracts using DBPedia Spotlight.  The presentation discussed the general functionality of Spotlight, some of the immediate changes in upcoming releases, and the future direction of the project.

The last part I wanted to discuss was our own presentation in the Triplification Challenge.  There were two tracks for the Challenge, an Open Track and an Open Government Data Track.  We competed in the latter.  The talk went extremely well (we won), and there were a number of interesting comments and questions to follow.  One person asked how we keep our data up to date, which is extremely relevant to IOGDS.  I believe at the time of the presentation, some of the catalogs had been converted more than 3 months prior, which meant we were likely missing a lot of updates.  Another discussion regarded IOGDS involvement with the CKAN community, which would be a step towards keeping the IOGDS up to date.  Lastly, there was a question about the degree to which the project performs Semantic search; while IOGDS does perform free text search over most (all?) of the literal values in the catalog, I discussed that building and demonstrating an open government ontology is a topic of importance to the project.

