Home > tetherless world > Introducing Sterno, another RDF syntax (Really?)

Introducing Sterno, another RDF syntax (Really?)

In this blog post, I want to introduce a RDF syntax called Sterno. (Oh no… not another one… right? Please, read on.) Sterno is an extension of the N-triples syntax and a subset of the Turtle syntax aimed at improving compression over N-triples while also preserving the simplicity of N-triples. But what could possibly warrant defining yet another RDF syntax?

After winning the 2009 Billion Triple Challenge, Greg and I realized that a fair amount of time in our system was spent transferring data from disk. At that time, our system read N-triples documents because their simple syntax was amenable to parallel I/O, but N-triples documents are often very verbose. Turtle, however, introduces many features which improve compression, and N-triples is a syntactic subset of Turtle. So the idea arose, how much of Turtle (i.e., which features of Turtle) should we use to extend the N-triples syntax in order to improve parallel I/O? The details of our investigation into the matter can be found in our paper entitled “Reducing I/O Load in Parallel RDF Systems via Data Compression,” published at the 1st Workshop on High-Performance Computing for the Semantic Web (HPCSW2011). (We also compare the use of Sterno “compression” of RDF data with LZO compression for parallel I/O. The HPCSW proceedings can be found here for those who are interested.)

Admittedly, a RDF syntax designed for parallel I/O would seem to have a limited audience, but it turns out that Sterno may be of more general use. Sterno’s simplicity may be desirable for a multitude of purposes simply because it is easier to support than Turtle (that is, easier to produce and parse), particularly for use on the command-line. Note that Sterno is not meant to replace or compete with any other RDF syntax; instead, it simply gives a name and definition to a useful middle ground between N-triples and Turtle.

Sterno is normatively described as an extension of the N-triples syntax. In other words, the Sterno syntax subsumes the N-triples syntax, and the Sterno syntax is defined as the N-triples syntax with the addition of the following Turtle features:

  • UTF-8 Encoding: A Sterno document is a Unicode character string encoded in UTF-8.
  • Prefix declarations and QNames: A Sterno document allows for prefix declarations and QNames, but all prefix declarations must occur at the beginning of the document before any actual triples.
  • Implicit datatypes for xsd:integer, xsd:double, xsd:decimal, and xsd:boolean. For example, "1"^^xsd:integer may simply appear in the document as 1.
  • The a keyword may be used to replace rdf:type whenever it occurs in the predicate position of a triple.
  • The empty collection () may be used to replace rdf:nil whenever it occurs in the subject or object position of a triple.
  • An anonymous blank node [] may be used, although its usefulness is severely limited in Sterno.
  • Blank node labels may be as complex as in Turtle. That is, we do not maintain the restriction in N-triples that blank node labels be only word characters. (E.g., _:blank-node is valid in Turtle and Sterno, but not in N-triples.)

An actual grammar for the Sterno syntax can be found in the extended version of our HPCSW paper. All this may be a bit too much to think about in one's head, so following is a contrived example in N-triples, Sterno, and Turtle. (For a more realistic example, see my FOAF profile in N-triples, Sterno, and Turtle.)

N-triples:

<file:///foaf.rdf#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<file:///foaf.rdf#me> <http://xmlns.com/foaf/0.1/nick> "Andr\u00E9" .
<file:///foaf.rdf#me> <http://xmlns.com/foaf/0.1/age> "40"^^<http://www.w3.org/2001/XMLSchema#integer> .
_:list <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#List> .
_:list <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "line1\n\tline2 \"quoted string\" " .
_:list <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
_:contrived <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
# What a contrived triple.

Sterno:

@prefix mine: <file:///foaf.rdf#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
mine:me a foaf:Person .
mine:me foaf:nick "André" .
mine:me foaf:age 40 .
_:list a rdf:List .
_:list rdf:first "line1\n\tline2 \"quoted string\" " .
_:list rdf:rest () .
[] a <http://www.w3.org/2002/07/owl#Thing> .
# What a contrived triple.

Turtle (with base URI <file:///foaf.rdf>):

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<#me> a foaf:Person ; foaf:nick "André" ; foaf:age 40 .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
( """line1
line2 "quoted string" """ ) a rdf:List .
[]a<http://www.w3.org/2002/07/owl#Thing> . # What a contrived triple.

Put roughly, the Sterno syntax maintains the simplicity of N-triples that each line contain at most one triple, and there must be whitespace between the RDF terms of a triple. Therefore, although it is not as concise as Turtle (e.g., property lists and object lists are not adopted), it is easier to parse and generate.

Feedback welcome, even encouraged.

(Why the name “Sterno”? The name “Sterno” originated as an abbreviation for sternotherus, a genus of aquatic turtle, the most common species of which typically grows to only 7.5-14 centimeters. The name is chosen to reflect that the Sterno syntax is a small, syntactic subset of the Turtle syntax. Additionally, it is an acronym meaning “Simple, TErse Rdf… NOthing else.”)

VN:F [1.9.22_1171]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)
Introducing Sterno, another RDF syntax (Really?), 10.0 out of 10 based on 2 ratings
Author: Categories: tetherless world Tags:
  1. July 4th, 2011 at 11:08 | #1

    As someone doing quite a bit of pig processing on the DBpedia NTriples dumps I appreciate the value of this initiative (esp. the global namespace prefix and the use of UTF-8 charset).

    Do you plan to publish some java, scala, python opensource tools to convert to and from NTriples?

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
    • July 4th, 2011 at 22:47 | #2

      Hi Olivier. Thanks for expressing interest in our work.

      The only tools I have are the code that I wrote for the evaluation, which is C++ code using MPI. It can do conversions from N-triples to Sterno given specified prefix declarations. It can also LZO compress and decompress in parallel. I had hoped (and still hope) to open source the code, but I wanted to clean it up a bit first. Now, I am busy with an internship, so it may be a while before I get back to it.

      Sorry that I couldn’t be of more help at this time.

      VN:F [1.9.22_1171]
      Rating: 0.0/5 (0 votes cast)
      VN:F [1.9.22_1171]
      Rating: 0 (from 0 votes)
  2. July 7th, 2011 at 17:16 | #3

    fantastic put up, very informative. I’m wondering why the opposite experts of this sector do not notice this. You must continue your writing. I’m sure, you’ve a great readers’ base already!

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: +1 (from 1 vote)
  3. July 12th, 2011 at 15:50 | #4

    Put roughly, the Sterno syntax maintains the simplicity of N-triples that each line contain at most one triple, and there must be whitespace between the RDF terms of a triple. Therefore, although it is not as concise as Turtle (e.g., property lists and object lists are not adopted), it is easier to parse and generate. Yes roughly indeed but as per bet um done very well. thanks as I have gained understanding here.

    VA:F [1.9.22_1171]
    Rating: 5.0/5 (1 vote cast)
    VA:F [1.9.22_1171]
    Rating: +2 (from 2 votes)
  1. July 8th, 2011 at 10:46 | #1