XML Schema Cannot Validate Semantic Correctness
A discussion came up on the W3C Semantic Web Healthcare and Life Sciences (HCLS) SIG mailing list around models in OWL, validation and XSD. I’m reproducing my response here, because it’s worthwhile for a larger audience and it involves some work from a TWCer (Jiao Tao), the Pellet ICV.
I feel I need to cut to the chase with this one: XML schema cannot validate semantic correctness.
It can validate that XML conforms to a particular schema, but that is syntactic. The OWL validator is nothing like a schema validator, first it produces a closure of all statements that can be inferred from the asserted information. This means that if a secondary ontology is used to describe some data, and that ontology integrates with the ontology that you’re attempting to validate against, you will get a valid result. An XML schema can only work with what’s in front of it.
Two, there are many different representations of information that go beyond XML, and it should be possible to validate that information without anything other than a mechanical, universal translation. For instance, there are a few mappings of RDF into JSON, including JSON-LD, which looks the most promising at the moment. Since RDF/XML and JSON-LD both parse to the same abstract graph, there is a mechanical transformation between them. When dealing with semantic validity, you want to check the graph that is parsed from the document, not the document itself.
The content matters, the format does not. For instance, let me define a new RDF format called RDF/CSV:
First column is the subject. First row is the predicate. All other cell values are objects. URIs that are relative are relative to the document, as in RDF/XML.
I can write a parser for that in 1 hour and publish it. It’s genuinely useful, and all you would have to do to read and write it is to use my parser or write one yourself. I can then use the parser, paired with Pellet ICV, and validate the information in the file without any additional work from anyone.
Maybe we need a simplified XML representation for RDF that looks more like regular XML. But to make a schema for an OWL ontology is too much work for too little payoff.
 This was a short enough format description that I decided to fit it into a tweet.