Home > owl, Semantic Web, tetherless world > XML Schema Cannot Validate Semantic Correctness

XML Schema Cannot Validate Semantic Correctness

August 21st, 2011

A discussion came up on the W3C Semantic Web Healthcare and Life Sciences (HCLS) SIG mailing list around models in OWL, validation and XSD. I’m reproducing my response here, because it’s worthwhile for a larger audience and it involves some work from a TWCer (Jiao Tao), the Pellet ICV.

I feel I need to cut to the chase with this one: XML schema cannot validate semantic correctness.

It can validate that XML conforms to a particular schema, but that is syntactic. The OWL validator is nothing like a schema validator, first it produces a closure of all statements that can be inferred from the asserted information. This means that if a secondary ontology is used to describe some data, and that ontology integrates with the ontology that you’re attempting to validate against, you will get a valid result. An XML schema can only work with what’s in front of it.

Two, there are many different representations of information that go beyond XML, and it should be possible to validate that information without anything other than a mechanical, universal translation. For instance, there are a few mappings of RDF into JSON, including JSON-LD, which looks the most promising at the moment. Since RDF/XML and JSON-LD both parse to the same abstract graph, there is a mechanical transformation between them. When dealing with semantic validity, you want to check the graph that is parsed from the document, not the document itself.

The content matters, the format does not. For instance, let me define a new RDF format called RDF/CSV[1]:

First column is the subject. First row is the predicate. All other cell values are objects. URIs that are relative are relative to the document, as in RDF/XML.

I can write a parser for that in 1 hour and publish it. It’s genuinely useful, and all you would have to do to read and write it is to use my parser or write one yourself. I can then use the parser, paired with Pellet ICV, and validate the information in the file without any additional work from anyone.

Maybe we need a simplified XML representation for RDF that looks more like regular XML. But to make a schema for an OWL ontology is too much work for too little payoff.

[1] This was a short enough format description that I decided to fit it into a tweet.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: owl, Semantic Web, tetherless world Tags:
  1. August 22nd, 2011 at 07:57 | #1

    With OWL I reckon it’s less confusing to talk about consistency, leaving validation just to describe syntax stuff.

    I love the idea of a spec in a tweet, though I don’t quite get it, would need to see an example. Also – how do you tell if the object of a statement is a URI, bnode ID or literal? How do you express typed literals?

    I suspect your CSV is NTriples-shaped. An alternative which fits better with the relational model (but maybe not so well with plain CSV) is foreach predicate (relation), list subject-object pairs. Not sure how you’d name graphs…

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  2. jimmccusker
    August 22nd, 2011 at 09:10 | #2

    The spec is simple. Here’s a very short example: http://dl.dropbox.com/u/9752413/examples/rdfcsv-example.csv

    I left out a lot of detail from the tweet, obviously. It was more to prove a point, but yes, I think that the N-Triples convention for URIs and literals holds. The interesting thing is, this could be a useful format for spreadsheets too, since CSV doesn’t currently support export of data formats. Heck, there could even be a special formula datatype.

    VN:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  1. No trackbacks yet.