Configuring Joseki + Pellet + TDB

From Semantic Portal Wiki

Jump to: navigation, search

Configuring Joseki + Pellet + TDB (Webpage) [ Edit ]
homepage http://blog.fmeyer.org/entry/configuring_joseki__pellet__tdb
relation SPARQL; TDB; Joseki; Pellet

This page describes how to set up a SPARQL endpoint that can process standard SPARQL queries and return results that include inferences provided by a server-side Pellet reasoner.

The following posts provide background and alternative perspectives:

Contents

1. Download TDB and Joseki

1. Download TDB-0.8.2 [1] and Joseki-3.4.0 [2]

TDB is a triple store affiliated with Jena. It allows you to load data into a directory. When you use TDB to load data into a directory, it populates the directory with index files. TDB can also be used to query the data that it has set up in a directory.

Joseki is a web application that ...

2. Install/configure TDB

2.1 Extract TDB-0.8.2.zip to a destination directory. In this tutorial, we will use: /opt/tdb. Then, register that directory in the TDBROOT environment variable.

% export TDBROOT=/opt/tdb
% ls $TDBROOT
ChangeLog.txt
README.txt
Store
bin
bin2
copyright.txt
doc
lib
tdb-src-0.8.2.zip
tdb.ttl
testing
tmp

2.2 Add /opt/tdb/lib to your classpath:

 export CLASSPATH=$TDBROOT/lib

2.3 The installation can be tested by running the shell script:

 % bin/tdbverify
 TDB test suite (development)
 ...
 OK (1060 tests)

Note: This must be run from $TDBROOT (/opt/tdb).

If you get a "Permission denied" or "NoClassDefFoundError",

 chmod +x bin/*

Unless you are using Cygwin, do not worry about the error:

uname: illegal option -- o
usage: uname [-amnprsv]

(If you want, you can just comment out the if [ "$(uname -o)" = "Cygwin" ] check in $TDBROOT/bin/make_classpath)

2.4 To use the tdb command line utility, you need to set up TDBROOT (as before). You also need to set up your PATH:

 export PATH=$TDBROOT/bin:$PATH

2.5 Create a directory that tdbloader can use to "set up shop", i.e., store index files:

 mkdir /work/data/bloggers-db-dir

2.6 Use tdbloader to load data or data into the database directory:

 %tdbloader --loc=/work/data/bloggers-db-dir bloggers.rdf
 
 ** Secondary indexes
 Index SPO->POS: 845 triples indexed in 0.04s [21,666 slots/s] 
 
 Index SPO->OSP: 845 triples indexed in 0.03s [24,852 slots/s]
 
 -- Finish index phase : 2009/10/21 19:35:24
 ** Close graph
 
 Time for load: 0.63s [1,353 triples/s]

2.7 Issue a query to the database directory with tdbquery:

 tdbquery --loc=/work/data/bloggers-db-dir --query=$TDBROOT/testing/Basic/basic-00.rq

Part of bloggers.rdf looks like:

Part of the bloggers.rdf

3. Install Joseki

3.1 Extract joseki-3.4.0.zip to a destination directory. In this tutorial, we will use: /opt/joseki

% export JOSEKIROOT=/opt/joseki
% ls $JOSEKIROOT
ChangeLog.txt
CopyrightNotice.txt
Data
README.txt
bin
doc
etc
joseki-3.4.0-sources.jar
joseki-config-example.ttl
joseki-config-sdb.ttl
joseki-config-tdb.ttl
joseki-config.ttl
lib
pom.xml
sdb.ttl
webapps

3.2 Turn on Joseki by running the shell script from $JOSEKIROOT:

 cd $JOSEKIROOT
 bin/rdfserver

You might need to:

 chmod +x bin/*

3.3 http://localhost:2020 should provide the SPARQLer webpage. Load http://localhost:2020/query.html ("General purpose SPARQL processor") and paste a SPARQL query into the appropriate field. You can use the query that we used to demonstrate TDB's tdbquery above in Section 2.7:

%cat $TDBROOT/testing/Basic/basic-00.rq 
PREFIX :  <http://example>

SELECT * 
{ ?x ?p ?z }

3.4 As can be seen in $JOSEKIROOT/bin/rdfserver's output, Joseki uses $JOSEKIROOT/joseki-config.ttl as a default configuration file:

15:59:34 INFO  Configuration        :: ==== Configuration ====
15:59:34 INFO  Configuration        :: Loading : <joseki-config.ttl>

In joseki-config.ttl, a joseki:Service is created that refers to the joseki:dataset <#books>:

# Service 2 - SPARQL processor only handling a given dataset
<#service2>
   rdf:type            joseki:Service;
   rdfs:label          "SPARQL on the books model";
   joseki:serviceRef   "books";
   ...
   joseki:dataset      <#books>;
   ...
   joseki:processor    joseki:ProcessorSPARQL_FixedDS;
.

<#books> is described in the Datasets section as a ja:RDFDataset at <file:Data/books.n3>. This is $JOSEKIROOT/Data/books.n3 on your local system, and the data set you are querying from http://localhost:2020/query.html.

## --------------------------------------------------------------
## Datasets

<#books>   
   rdf:type   ja:RDFDataset;
   rdfs:label "Books";
   ja:defaultGraph [ 
       rdfs:label "books.n3";
       a ja:MemoryModel;
       ja:content [
           ja:externalContent <file:Data/books.n3> 
       ];
   ];
.

books.n3 looks like:

File:Books.n3.jpg

4. Configure Joseki to query data in a TDB directory

4.1 Instead of querying plain RDF files on disk (like in the books.n3 example above in Section 3.3), Joseki can query data in a TDB directory (like blogger-db-dir that was created above in Section 2.6). To do so, you will need a new File:Joseki-config-bloggers-tdb.ttl and a new File:Make classpath-tdb.sh. While we discuss the steps that we took to make these for you, you can download them and tweak the absolute paths. It might be instructive to diff the new files with their off-the-shelf versions.

4.2 Duplicate the joseki-config.ttl file that was used by default in Section 3.2:

 cp $JOSEKIROOT/joseki-config.ttl $JOSEKIROOT/joseki-config-bloggers-tdb.ttl

4.3 Add the tdb namespace prefix to $JOSEKIROOT/joseki-config-bloggers-tdb.ttl:

 @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .

4.4 Comment out #service1 (because we want to use the same 'sparql' joseki:serviceRef, and we prefer a FixedDS instead of a general endpoint).

#<#service1>
#    rdf:type            joseki:Service ;
#    rdfs:label          "service point" ;
#    joseki:serviceRef   "sparql" ;  # web.xml must route this name to Joseki
#    joseki:processor    joseki:ProcessorSPARQL ;
#    .

4.5 In the "Services" section, change

# Service 2 - SPARQL processor only handling a given dataset
<#service2>
   rdf:type            joseki:Service ;
   rdfs:label          "SPARQL on the books model" ;
   joseki:serviceRef   "books" ;   # web.xml must route this name to Joseki
   # dataset part
   joseki:dataset      <#books> ;
   # Service part.
   # This processor will not allow either the protocol,
   # nor the query, to specify the dataset.
   joseki:processor    joseki:ProcessorSPARQL_FixedDS ;
.

to

# Service 2 - SPARQL processor only handling a given dataset in a TDB directory
<#service2>
   rdf:type            joseki:Service ;
   rdfs:label          "SPARQL-TDB" ;
   joseki:serviceRef   "sparql" ;   # web.xml must route this name to Joseki
   # dataset part
   joseki:dataset      <#bloggers-dataset> ;
   # Service part.
   # This processor will not allow either the protocol,
   # nor the query, to specify the dataset.
   joseki:processor    joseki:ProcessorSPARQL_FixedDS ;
.

4.6 In the "Datasets" section, add these 14 lines:

[] ja:loadClass "com.hp.hpl.jena.tdb.TDB"      . ## Initialize TDB.
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model      .

<#bloggers-dataset>
   a ja:RDFDataset;  
   ja:defaultGraph [
      a ja:InfModel;  
      ja:baseModel [ 
         a tdb:GraphTDB;
         tdb:location "/work/data/bloggers-db-dir";
      ];
   ];
.

The tdb:location needs to match the --loc parameter that tdbloader used above in Section 2.6.

4.7 We just asked Joseki to load com.hp.hpl.jena.tdb.TDB, so we need to make sure Joseki has it in its classpath when it runs. Add the following lines to $JOSEKIROOT/bin/make_classpath:

TDBROOTLIBDIR="/opt/tdb/lib"            <-- under LIBDIR="$DIRROOT/lib" on line 18
...
for jar in "$TDBROOTLIBDIR"/*.jar          <-- under "# Append any jars in the lib/ directory" section on line 39/40
  do
  # Check for no expansion
  [ -e "$jar" ] || break
  echo "Path: $jar"
  [ "$CP" != "" ] && CP="${CP}${SEP}"
  CP="${CP}$jar"
done

4.8 Turn on Joseki, just like in Section 2.6, but with an additional argument for a non-default configuration file:

cd $JOSEKIROOT
bin/rdfserver joseki-config-bloggers-tdb.ttl

4.9 Load http://localhost:2020/sparql.html to query /work/data/bloggers-db-dir.

prefix rdfs:     <http://www.w3.org/2000/01/rdf-schema#>
prefix dc:       <http://purl.org/dc/elements/1.1/>

select ?self ?title
where
{
  ?self rdfs:seeAlso ?self .
  optional { ?self dc:title ?title }
}
order by $self

5. Configure Joseki to query inferences from data+ontology in a single TDB directory

5.1 If the TDB directory contains raw data and ontology data, Joseki can apply a reasoner and return inferenced statements as well as the original statements. To do so, you will need a new File:Joseki-config-bloggers-tdb-pellet.ttl and a new File:Make classpath-tdb-pellet.sh (again). While we discuss the steps that we took to make these for you, you can download them and tweak the absolute paths (again). As before, it might be instructive to diff the new files with their off-the-shelf versions.

 cp $JOSEKIROOT/joseki-config-bloggers-tdb.ttl $JOSEKIROOT/joseki-config-bloggers-tdb-pellet.ttl

5.2 In $JOSEKIROOT/joseki-config-bloggers-tdb-pellet.ttl, change

# Service 2 - SPARQL processor for only handling a given dataset in a TDB directory
<#service2>
    rdf:type            joseki:Service ;
    rdfs:label          "SPARQL-TDB" ;
    joseki:serviceRef   "sparql" ;   # web.xml must route this name to Joseki
    # dataset part
    joseki:dataset      <#bloggers-dataset> ;

to

# Service 2 - SPARQL processor for only handling a given dataset in a TDB directory
<#service2>
    rdf:type            joseki:Service ;
    rdfs:label          "SPARQL-TDB" ;
    joseki:serviceRef   "sparql" ;   # web.xml must route this name to Joseki
    # dataset part
    joseki:dataset      <#bloggers-dataset_reasoning> ;

5.3 After

<#bloggers-dataset>
   a ja:RDFDataset;  
   ja:defaultGraph [
      a ja:InfModel;  
      ja:baseModel [ 
         a tdb:GraphTDB;
         tdb:location "/work/data/blogger-db-dir";
      ];
   ];
.

add the 15 lines:

<#bloggers-dataset_reasoning>
   a ja:RDFDataset;  
   ja:defaultGraph [
      a ja:InfModel;  
      ja:reasoner [  
         ja:reasonerClass "org.mindswap.pellet.jena.PelletReasonerFactory";  
      ];  
      ja:baseModel <#data_and_ontology_graph> 
   ];
.

<#data_and_ontology_graph> 
   a tdb:GraphTDB ;
   tdb:location "/work/data/blogger-db-dir" ;
.
  • Another example of a modified Joseki configuration file is here

5.4 We just asked Joseki to load org.mindswap.pellet.jena.PelletReasonerFactory, so we need to make sure Joseki has it in its classpath when it runs. Add the following lines to $JOSEKIROOT/bin/make_classpath:

PELLETLIBDIR="/opt/pellet/lib"                      <-- under LIBDIR="$DIRROOT/lib" on line 18
...
# Prepend any jars in the pellet lib directory      <-- under "# Append any jars in the lib/ directory" section on line 39/40
for jar in "$PELLETLIBDIR"/*.jar
  do
  # Check for no expansion
  [ -e "$jar" ] || break
  #echo "Path: $jar"
  [ "$CP" != "" ] && CP="${CP}${SEP}"
  CP="${CP}$jar"
done

Reason to do this is that every time you run the rdfserver script of joseki, the classpath is dynamically generated, by invoking the make_classpath script. So, we need to put the lib directory of pellet into the classpath so that joseki can find necessary jars.

  • A less desirable alternative is to copy $PELLETROOT/lib/*.jar into $JOSEKIROOT/lib directory.

5.5 Run Joseki, just like in Section 2.6, but with an additional argument for a non-default configuration file:

cd $JOSEKIROOT
bin/rdfserver joseki-config-bloggers-tdb-pellet.ttl

5.6 Load http://localhost:2020/sparql.html to query inferences from /work/data/bloggers-db-dir.

prefix sioc: <http://rdfs.org/sioc/ns#> 

select ?created ?creator
where {
  ?created sioc:has_creator ?creator .
}

5.7 No results?! That's because bloggers.rdf does not have any ontology elements. Try adding

%cat bloggers-ontology.ttl
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix sioc: <http://rdfs.org/sioc/ns#> .

foaf:maker rdfs:subPropertyOf sioc:has_creator .

with

tdbloader --loc=/work/data/bloggers-db-dir bloggers-ontology.ttl

5. Configure Joseki to query inferences from data in a TDB directory, with ontologies in separate files

  • Or here if you don't want to use TDB as the triple store, instead you just want to use some rdf/owl files, you can change it to:
<#baseGraph> a ja:MemoryModel ;
     ja:content [
         ja:externalContent <file:data/tbox.owl> ;
         ja:externalContent <file:data/abox.owl>
     ] .
Semantic Web Community
Tetherless World constellation
maintenance