AAAI 2011 Fall Symposium on Open Government Knowledge, This weekend (Nov 4-6), Washington DC

November 1st, 2011

————————————————————————————————
Title:  Open Government Knowledge: AI Opportunities and Challenges
When:  4-6 November 2011
Where:  Westin Arlington Gateway in Arlington, Virginia, USA
Homepage: http://tw.rpi.edu/ogk2011
Program (PDF): http://tw.rpi.edu/media/latest/ogk2011.pdf
————————————————————————————————

Please join us to meet the thought governmental and business leaders in
US open government data activities, and discuss the challenges. The
symposium features Friday (Nov 4) as governmental day with speakers on
Data.gov, openEi.org, open gov data activities in NIH/NCI, NASA. and
Saturday (Nov 5) as R&D day with speakers from industry such as Google
and Microsoft, as well international researchers.

This symposium will explore how AI technologies such as the Semantic Web,
information extraction, statistical analysis and machine learning, can be used
to make the valuable knowledge embedded in open government data more
explicit, accessible and reusable.

Co-Chairs
* Li Ding, Qualcomm (Previously RPI)
* Tim Finin, UMBC
* Lalana Kagal, MIT
* Deborah McGuinness, RPI

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

IOGDC Presentation @ I-Semantics 2011, Triplification Challenge

September 26th, 2011

The I-Semantics 2011 Conference, co-located with I-KNOW, was held Sept. 7 – 9, 2011 in Graz, Austria. The conference covered a range of topics, including Web-scale recommendation systems, information visualization, semantic content engineering, Web science, social Web, SemWeb applications, and the list goes on. Given the broad scope of the conference, I decided to target the talks that were most compatible with our research agendas at TWC. Namely, I looked for work that was related to Linked Open Data, work that was applicable to our Semantic eScience Framework project, and some natural language and machine learning work that I am personally interested in.

Linked Data was a major theme of I-Semantics.  I saw an interesting talk on a RESTful architecture for both reading and writing Linked Data.  The architecture placed some restrictions on how the data could be structured and queried, defining ontological concepts of “records” and “layers” used to annotate the data, which when aggregated together, essentially form named graphs.  They also make an interesting use of the HTTP range header to retrieve partial records.  Another interesting presentation was a vocabulary for creating linked data versions of call for papers for scientific publications.  I think we should look into this vocabulary at TWC, particularly for our website, as it may be a good solution to keeping people up to date on relevant submission deadlines for publication.   Much of the other work on Linked Data was in the Triplification Challenge session.  We presented our own work on the International Open Government Data Catalog (now IOGD Search, or IOGDS) at this session, which I discuss in the final paragraph.  Other submissions included a “trip planner”, which used both LOD resources as well as the Open Provenance Model to annotate tourism-related information on the Web; also, there was an interesting application for annotating and performing semantic search over annotations of online media.  Our primary competitor in the Triplification Challenge (for the Open Government Track) was the work on Open Data Albania.  The authors gave a demo of their website, which was based on CKAN.  I found the most interesting part of the presentation was that they automatically convert datasets published in their catalog into Google Data Tables, which are compatible with various Google Viz tools, such as the Google MotionChart.

There were a number of interesting presentations that covered research of interest in the Semantic eScience Framework project and I discuss two of them here.  The first was a presentation that I saw on a knowledge federation framework for biomedical applications.  The interesting part about this framework was the parallels in the design of this knowledge federation framework with the design of S2S.  The framework, called Coeus, uses a “connector” (referred to as an “adapter” in S2S) to attach data sources in various formats (i.e., CSV, XML, RDB, RDF) to the framework.  It then simplifies the application development process by reducing the effort required to aggregate multiple “connected” resources.  The other interesting research that I saw was in the poster session on Thursday afternoon.  One of the posters was on ontology modularization, and the authors had an interesting view of the structure of modular ontologies.  In the past, we have investigated “three-layer” modularization architectures, such as this one, for VSTO and SeSF.  This work was a variation on the “three-layer” architecture, where the layers were not separated by levels of expressivity, but levels of abstraction.  The purpose of the more abstract ontologies was to provide a frame for domain experts to rapidly/easily build their applications off of.  I have been in contact with the authors and they will be interested to hear if we apply this architecture in our SeSF ontology development. They will also be presenting on this work at ISWC 2011.

The best paper award for the conference went to Pablo Mendes and the DBPedia Spotlight team.  I was very interested in this work because I am working on a project for the Federation of Earth Science Informatics Partners that extracts entities from American Geophysical Union abstracts using DBPedia Spotlight.  The presentation discussed the general functionality of Spotlight, some of the immediate changes in upcoming releases, and the future direction of the project.

The last part I wanted to discuss was our own presentation in the Triplification Challenge.  There were two tracks for the Challenge, an Open Track and an Open Government Data Track.  We competed in the latter.  The talk went extremely well (we won), and there were a number of interesting comments and questions to follow.  One person asked how we keep our data up to date, which is extremely relevant to IOGDS.  I believe at the time of the presentation, some of the catalogs had been converted more than 3 months prior, which meant we were likely missing a lot of updates.  Another discussion regarded IOGDS involvement with the CKAN community, which would be a step towards keeping the IOGDS up to date.  Lastly, there was a question about the degree to which the project performs Semantic search; while IOGDS does perform free text search over most (all?) of the literal values in the catalog, I discussed that building and demonstrating an open government ontology is a topic of importance to the project.

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

XML Schema Cannot Validate Semantic Correctness

August 21st, 2011

A discussion came up on the W3C Semantic Web Healthcare and Life Sciences (HCLS) SIG mailing list around models in OWL, validation and XSD. I’m reproducing my response here, because it’s worthwhile for a larger audience and it involves some work from a TWCer (Jiao Tao), the Pellet ICV.

I feel I need to cut to the chase with this one: XML schema cannot validate semantic correctness.

It can validate that XML conforms to a particular schema, but that is syntactic. The OWL validator is nothing like a schema validator, first it produces a closure of all statements that can be inferred from the asserted information. This means that if a secondary ontology is used to describe some data, and that ontology integrates with the ontology that you’re attempting to validate against, you will get a valid result. An XML schema can only work with what’s in front of it.
Read more…

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: owl, Semantic Web, tetherless world Tags:

Improving the DATA act – open govt in the USA

July 8th, 2011

Beth Noveck, NYU law professor and previously Deputy Chief Technology Officer of the US, and I have written a  blog posting that discusses the DATA act, a proposed US law to improve the transparency of US spending. I realized this may also be of interest to folks working on Linked Data, and especially linked open government data, so I thought I’d mention this here as well.

VN:F [1.9.13_1145]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

Introducing Sterno, another RDF syntax (Really?)

July 3rd, 2011

In this blog post, I want to introduce a RDF syntax called Sterno. (Oh no… not another one… right? Please, read on.) Sterno is an extension of the N-triples syntax and a subset of the Turtle syntax aimed at improving compression over N-triples while also preserving the simplicity of N-triples. But what could possibly warrant defining yet another RDF syntax?

After winning the 2009 Billion Triple Challenge, Greg and I realized that a fair amount of time in our system was spent transferring data from disk. At that time, our system read N-triples documents because their simple syntax was amenable to parallel I/O, but N-triples documents are often very verbose. Turtle, however, introduces many features which improve compression, and N-triples is a syntactic subset of Turtle. So the idea arose, how much of Turtle (i.e., which features of Turtle) should we use to extend the N-triples syntax in order to improve parallel I/O? The details of our investigation into the matter can be found in our paper entitled “Reducing I/O Load in Parallel RDF Systems via Data Compression,” published at the 1st Workshop on High-Performance Computing for the Semantic Web (HPCSW2011). (We also compare the use of Sterno “compression” of RDF data with LZO compression for parallel I/O. The HPCSW proceedings can be found here for those who are interested.)

Admittedly, a RDF syntax designed for parallel I/O would seem to have a limited audience, but it turns out that Sterno may be of more general use. Sterno’s simplicity may be desirable for a multitude of purposes simply because it is easier to support than Turtle (that is, easier to produce and parse), particularly for use on the command-line. Note that Sterno is not meant to replace or compete with any other RDF syntax; instead, it simply gives a name and definition to a useful middle ground between N-triples and Turtle.

Sterno is normatively described as an extension of the N-triples syntax. In other words, the Sterno syntax subsumes the N-triples syntax, and the Sterno syntax is defined as the N-triples syntax with the addition of the following Turtle features:

  • UTF-8 Encoding: A Sterno document is a Unicode character string encoded in UTF-8.
  • Prefix declarations and QNames: A Sterno document allows for prefix declarations and QNames, but all prefix declarations must occur at the beginning of the document before any actual triples.
  • Implicit datatypes for xsd:integer, xsd:double, xsd:decimal, and xsd:boolean. For example, "1"^^xsd:integer may simply appear in the document as 1.
  • The a keyword may be used to replace rdf:type whenever it occurs in the predicate position of a triple.
  • The empty collection () may be used to replace rdf:nil whenever it occurs in the subject or object position of a triple.
  • An anonymous blank node [] may be used, although its usefulness is severely limited in Sterno.
  • Blank node labels may be as complex as in Turtle. That is, we do not maintain the restriction in N-triples that blank node labels be only word characters. (E.g., _:blank-node is valid in Turtle and Sterno, but not in N-triples.)

An actual grammar for the Sterno syntax can be found in the extended version of our HPCSW paper. All this may be a bit too much to think about in one's head, so following is a contrived example in N-triples, Sterno, and Turtle. (For a more realistic example, see my FOAF profile in N-triples, Sterno, and Turtle.)

N-triples:

<file:///foaf.rdf#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<file:///foaf.rdf#me> <http://xmlns.com/foaf/0.1/nick> "Andr\u00E9" .
<file:///foaf.rdf#me> <http://xmlns.com/foaf/0.1/age> "40"^^<http://www.w3.org/2001/XMLSchema#integer> .
_:list <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#List> .
_:list <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "line1\n\tline2 \"quoted string\" " .
_:list <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
_:contrived <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
# What a contrived triple.

Sterno:

@prefix mine: <file:///foaf.rdf#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
mine:me a foaf:Person .
mine:me foaf:nick "André" .
mine:me foaf:age 40 .
_:list a rdf:List .
_:list rdf:first "line1\n\tline2 \"quoted string\" " .
_:list rdf:rest () .
[] a <http://www.w3.org/2002/07/owl#Thing> .
# What a contrived triple.

Turtle (with base URI <file:///foaf.rdf>):

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<#me> a foaf:Person ; foaf:nick "André" ; foaf:age 40 .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
( """line1
line2 "quoted string" """ ) a rdf:List .
[]a<http://www.w3.org/2002/07/owl#Thing> . # What a contrived triple.

Put roughly, the Sterno syntax maintains the simplicity of N-triples that each line contain at most one triple, and there must be whitespace between the RDF terms of a triple. Therefore, although it is not as concise as Turtle (e.g., property lists and object lists are not adopted), it is easier to parse and generate.

Feedback welcome, even encouraged.

(Why the name “Sterno”? The name “Sterno” originated as an abbreviation for sternotherus, a genus of aquatic turtle, the most common species of which typically grows to only 7.5-14 centimeters. The name is chosen to reflect that the Sterno syntax is a small, syntactic subset of the Turtle syntax. Additionally, it is an acronym meaning “Simple, TErse Rdf… NOthing else.”)

VN:F [1.9.13_1145]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.13_1145]
Rating: +1 (from 1 vote)
Author: Categories: tetherless world Tags: