Processors: Adding new query languages and other operations

Design

Processors, query processors and processors for other operations, deliver the functionality of a Joseki server. Everything a client can request of a model is performed by a processor.

Processors are dynamically loaded at start-up.  They are modules, th eunit of loading code into a Joseki server. The configuration file contains the information to identify a query language by some short name, or it long URI, and associate it with a query processor. For other processors, the binding on a model gives the operation a short name and points to the code for the operation.

Query requests identify the model on which to operate, the query language and the query itself.  They return a single subgraph.  At this level, query is subgraph extraction; a client library may wish to build variable bindings from such a subgraph and so the subgraph include all the information need for a particular language to do so. If no query language is specified (so there is no HTTP GET query string), the request is interpreted as a plain GET, fetch the whole model, just like browser use of HTTP GET.

All other operation requests identify the model on which to operate, provides some parameters, simple name/value pairs, and zero or one argument models.  They return a single model.

Example queries.:

The subgraph returned in the RDQL query is guaranteed to be such that the same query issued on this subgraph would give the same bindings. The client library reconstructs the variable bindings locally.

Dispatch

Any incoming request is first turned into a Java form (class Request) that records parameters and the argument model, if any. Queries have the query language recorded. The implementer of a processor does not need to know about the encoding/decoding details of the transport used.

Query over HTTP GET

For an HTTP GET request, the query and the query language are given as strings. These become the "query" and "lang" parameters.

Query over HTTP POST

Some queries are too big to go via HTTP GET, or an application needs to bypass caches to ensure a query against the current state is done. For this, the request puts the query into an RDF graph with a simple, known vocabulary. The query can just be a large RDF literal or structured RDF, depending on the style of the query language. The only requirement the adapter processor makes is that there is a single instance of the property that names the query language. If there is use of the property that says there is a query string, a call to the underlying query processor is made with the query as a string, otherwise the query model is used. The interface for a query processor captures these details.

Other operations over HTTP POST

All general operations, such as "add", only come via HTTP POST. This allows a more general argument passing mechanism and ensures that operations are sent to the target model itself, rather than a web cache. If the operation is something that is cacheable, or bookmarkable, think of it as a query and then the full power of caching HTTP GETs will be available.

Writing a QueryProcessor

Query processors must supply the interface:

public interface QueryProcessor extends Processor
{
    public Model execQuery(Model model, String queryString)
       throws RDFException, QueryExecutionException ;
    public Model execQuery(Model model, Model queryModel)
       throws RDFException, QueryExecutionException ;
}

Queries are always read-only operations. There is a class QueryProcessorCom in the processors package org.joseki.server.processors that will be suitable much of the time so let's look at a possible GET processor (at the time of writing this is the code for the GET processor):

public class QueryProcessorGET extends QueryProcessorCom
{
    public QueryProcessorGET() { super() ; }

    // The module interface requires ...
    public String getOperationURI()
    { return JosekiVocab.queryOperationGET ; }

    public Model execQuery(AttachedModel aModel, String queryString)
        throws RDFException, QueryExecutionException
    {
        if ( aModel.getIsImmutable() )
            return aModel.getModel() ;
        // Muatble model - need to take a copy as it may change
        // or be chaning when the reply is sent.
        Model result = new ModelMem() ;
        result.add(aModel.getModel()) ;
        return result ;
    }

    public Model execQuery(AttachedModel aModel, Model queryModel)
        throws RDFException, QueryExecutionException
    {
        throw new QueryExecutionException(
                       ExecutionError.rcOperationNotSupported,
                       "Can't GET a model this way") ;
    }

}

This query processor specifies its URI (all modules have a URI so that we can make assertions about them) and supplies implementations of the execQuery methods. For GET, it just returns the model, or a copy of the model if it might change. Note that it throws an error if invoked via HTTP POST.

Once there is a class that provides the query language, it is bound to AttachedModels. See the configuration section and the hasQueryOperation property.

Writing an Operation Processor

public interface Processor extends Loadable
{

    public Model exec(Request request) throws ExecutionException ;
    
    static final int ARGS_ZERO         = 0 ;
    static final int ARGS_ONE          = 1 ;
    static final int ARGS_ZERO_OR_ONE  = -1 ;
    
    public int argsNeeded() ;
}

A processor must provide its URI to identify it, and provide an implementation of exec. There are some convenience declarations of constants.

Once there is a class that implements the operation, it needs to be bound to the models it applies to. See the configuration section and the hasOperation property on AttachedModels.

To help, there are two abstract classes, OneArgProcessor and ZeroArgProcessor, to help implementers for the cases of operations taking one or zero models as arguments (they can have any number of parameters).  These implementations provide locking.

Concurrency

In the example of query GET, the operation checked to see if it needed to return a copy of the model, not the model itself. This is because the read-lock (all queries are done under a read-lock if they use the standard query processor framework) only covers the time the operation is being performed, not the time it is being encoded and written back over the transport.

Operation processors must be safe during their execution and also safe in their results. In the standard implementations for processors, OneArgProcessor and ZeroArgProcessor, insist on at least a read-lock (its a multiple-reader, single writer lock).