Joseki - Configuration

A Joseki server is configured with services. A service is implemented by a processor and that processor can either execute queries on a fixed, predefined dataset, or it can dynamically assemble the dataset from the query, or a do either, depending on the query. If the processor has a fixed dataset, then a query involving FROM or FROM NAMED, or a protocol request that describes the dataset will be rejected.

When publishing some existing data, it will be most common to use a processor that does not allow the dataset to be specified in the query or the query protocol request.

The same configuration is used for both HTTP and SOAP.

The configuration file is an RDF graph. The default name is "joseki-config.ttl" and is often written in Turtle or N3, rather than RDF/XML. The server can read the full range of RDF serializations. The distribution includes examples in "joseki-config-example.ttl"

Beware that the web.xml file must route incoming requests to the Joseki servlet. See the protocol specific details.

(Example web.xml)

Example

First, we declare some prefixes, then some basic information about this file. Because the configuration file is RDF, order of the sections does not matter to the server but it does help a human reading the file.

@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix module:  <http://joseki.org/2003/06/module#> .
@prefix joseki:  <http://joseki.org/2005/06/configuration#> .

@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

## Note: adding rdfs:label to nodes will cause Joseki 
## to print that in any log messages.

## --------------------------------------------------------------
## About this configuration

<> rdfs:comment "Joseki example configuration" .

## --------------------------------------------------------------
## About this server

[]  rdf:type joseki:Server ;
    .

Basic Service

Having got preliminaries out of the way, the next section describes the services:

# Service 1
# General purpose SPARQL processor, no dataset, expects the
# request to specify the dataset (either by parameters in the
# protocol request or in the query itself).

[]
    rdf:type           joseki:Service ;
    rdfs:label         "service point" ;
    joseki:serviceRef  "sparql" ;
    joseki:processor   joseki:ProcessorSPARQL ;
    .

A service must have a RDF type, a service reference (used to form the the URL or SOAP service name) and a processor (the thing that executes the request itself). This example also gave it a printable name "service point" that will in log messages.

For HTTP, service references are used to form the URL of the service by combining the location of the Joseki server, based on the web application name

If the web application is "joseki", on machine www.sparlq.org, then the web application is URLs starting http://sparql.org/joseki and a query request will look like:

http://www.sparql.org/joseki/sparql?query=....

But if the web application is the root application, then the URL will not involve a web application name:

http://www.sparql.org/sparql?query=....

See the details of your choosen web application server. The standalone Joseki server uses jetty running on port 2020, and mounts the joseki servlet in the root application giving:

http://machine:2020/sparql?query=....

Service with a dataset

A service can also be set up to respond to queries made on a fixed dataset, not one specified in the protocol request or in the query string. The fixed dataset is added to the service description:

# Service 2 - SPARQL processor only handling a given dataset
[]
    rdf:type           joseki:Service ;
    rdfs:label         "SPARQL on the books model" ;
    joseki:serviceRef  "books" ;
    # dataset part
    joseki:dataset     _:books ;
    # Service part.
    # This processor will not allow either the protocol,
    # nor the query, to specify the datset.
    joseki:processor   joseki:ProcessorSPARQL_FixedDS ;
    .

Here, we have used a blank node with a label, then placed the description of the dataset elsewhere. We coudl have places the definition inline or used a URI.

Defining Datasets

A dataset is defined using a Jena assembler description augmented with vocabulary to handle RDF datasets, not just single Jena models. ARQ also directly understands the augmented vocabulary.

An RDF dataset is a collection of a unnamed, default graph and zero or more named graphs. Queries access the named graph through the GRAPH keyword in SPARQL.

Each graph is a Jena model, and these are described with the Jena Assembler vocabulary. We give some example configuration here but the assembler descriptions can describe a wide variety of model setups, including connection to an external OWL DL reasoner, such as Pellet (an open source reasoner).

This first dataset just has one graph - the default graph. The content is loaded from a file (there could be several files loaded) but the file name does not give the model a name. The dataset description has an explicit type.

# A dataset of one model as the default graph

_:books rdf:type ja:RDFDataset ;
    rdfs:label      "Books" ;
    ja:defaultGraph
       [ a ja:MemoryModel ;
         rdfs:label "books.ttl" ;
         ja:content [ ja:externalContent <file:Data/books.ttl> ]
       ] ;
    .

A more complicated example places the default graph description elsewhere and has two, named graph. Note that the names of the graphs are not the same as where the data for the graphs comes from.

_:ds1 rdf:type ja:RDFDataset ;
    ja:defaultGraph   _:model0 ;
    rdfs:label            "Dataset _:ds1" ;
    ja:namedGraph
        [ ja:graphName  <http://example.org/name1> ;
          ja:graph      _:model1 ] ;
    ja:namedGraph
        [ ja:graphName  <http://example.org/name2> ;
          ja:graph      _:model2 ] ;
    .

Model Descriptions

Next, we have model descriptions, using the Jena assembler vocabulary:

## --------------------------------------------------------------
## Individual graphs (Jena calls them Models)
## (syntax of data files is determined by file extension - defaults to RDF/XML)

_:model0 rdf:type ja:MemoryModel ;
    rdfs:label "Model (plain, merge the 2 RDF files)" ;
    ja:content [ 
        ja:externalContent <file:D1.ttl> ;
        ja:externalContent <file:D2.ttl> ;
      ] ;
    .

_:model1 rdf:type ja:MemoryModel ;
    rdfs:label "Model (D1.ttl for content)" ;
    ja:content [ 
        ja:externalContent <file:D1.ttl> ;
       ] ;
    .

_:model2 rdf:type ja:MemoryModel ;
    rdfs:label "Model (D2.ttl for content)" ;
    ja:content [ 
        ja:externalContent <file:D2.ttl> ;
      ] ;
    .

The dataset example placed the description for the data for the "Books" dataset in-line.

Database Models

A graph that is held in a Jena-format database can be used as well:

## --------------------------------------------------------------
_:db rdf:type ja:RDBModel ;
    ja:connection
    [
        ja:dbType         "MySQL" ;
        ja:dbURL          "jdbc:mysql://localhost/data" ;
        ja:dbUser         "user" ;
        ja:dbPassword     "password" ;
        ja:dbClass        "com.mysql.jdbc.Driver" ;
    ] ;
    ja:modelName "books" 
    .

Processors

Finally, we have the core definitions of processors. Processors can be described inline, using blank nodes, but it is convenient to give them URIs and place the definitions elsewhere. Each processor description can have parameters which are passed to each class instance when created.

A processor implemented by a module that is a dynamically loaded Java class.

joseki:ProcessorSPARQL
    rdf:type                     joseki:Processor ;
    rdfs:label                   "General SPARQL processor" ;
    module:implementation        joseki:ImplSPARQL ;
    # Parameters - this processor processes FROM/FROM NAMED
    joseki:allowExplicitDataset  "true"^^xsd:boolean ;
    joseki:allowWebLoading       "true"^^xsd:boolean ;
    ## And has no locking policy (it loads data each time).
    ## The default is mutex (one request at a time)
    joseki:lockingPolicy joseki:lockingPolicyNone ;
    .

joseki:ProcessorSPARQL_FixedDS
    rdf:type                     joseki:Processor ;
    rdfs:label                   "SPARQL processor for fixed datasets" ;
    module:implementation        joseki:ImplSPARQL ;
    # This processor does not accept queries with FROM/FROM NAMED
    joseki:allowExplicitDataset  "false"^^xsd:boolean ;
    joseki:allowWebLoading       "false"^^xsd:boolean ;
    # Fixed background dataset : multiple read requests are OK
    joseki:lockingPolicy joseki:lockingPolicyMRSW ;
 .

joseki:ImplSPARQL
    rdf:type          joseki:ServiceImpl ;
    module:className  <java:org.joseki.processors.SPARQL> .

# For Emacs:
# Local Variables:
# tab-width: 4
# indent-tabs-mode: nil
# End:

Dataset Pooling

Joseki supports pooling of datasets so that there can be multiple queries outstanding and in progress on the the same service endpoint. This is useful for use with SDB where it results in multiple JDBC connections to the underlying SQL database. Because results stream in Joseki, when a request returns from the HTTP request, the results of the query may not have been compleetly sent. The client has access to the start of the results from the HTTP response stream and will continue to consume results as Joseki streams them. Joseki cleans up aclass="box"ny transactions or locks allocated then returns the dataset to the pool for reuse.

A pool is created use the joseki:poolSize property on the dataset:

<#dataset> rdf:type ja:RDFDataset ;
    joseki:poolSize     5 ;
    ...

and for SDB that is the sdb:DatasetStore (a subclass of ja:RDFDataset).

<#sdb> rdf:type sdb:DatasetStore ;
    joseki:poolSize     5 ;
    sdb:store <#store> .

<#store> rdf:type sdb:Store  ;
    rdfs:label "SDB" ;
    ...