Jena 2 User Manual: Inference support

This section of the user manual will describe the support for inference over RDF graphs that is included within Jena. However, it has not been written. In its place we have some brief notes to assist brave users who wish to experiment with preview releases of this functionality before the documentation is ready.

Overview of the inference support machinery

The Jena2 reasoner subsystem is designed to allow a range of inference engines to be plugged into Jena. Such engines are used to derive additional RDF assertions which are entailed from some base RDF together with any optional ontology information and the axioms and rules associated with the reasoner. The primary use of this mechanism is to support the use of languages such as RDFS and OWL which allow additional facts to be inferred from instance data and class descriptions. However, the machinery is designed to be quite general and, in particular, it includes a generic rule engine that can be used for many RDF processing or transformation tasks.

We will try to use the term inference to refer to the abstract process of deriving additional information and the term reasoner to refer to a specific code object that performs this task. Such usage is arbitrary and if we slip into using equivalent terms like reasoning and inference engine please forgive us.

The overall structure of the inference machinery is illustrated below.

Overall structure of inference machinery

As illustrated the inference machinery is implemented at the level of the Graph SPI [TODO:cross link to that section]. One or more Graphs containing RDF data can be attached to a specific reasoner to generate a derived graph of type InfGraph [TODO: links to javadoc]. The InfGraph acts as if it is a graph containing all of the triples in the base data graph(s) together with any additional triples which the reasoner is able to infer. However, many of these triple will be virtual in the sense that they are only generated in response to access requests.

The derived InfGraph can then be viewed through any of the Jena supported APIs, specifically the general RDF Model API and the various profiles of the Ontology API. The choice of access API, underlying reasoner and physical implementation of the data graphs can all be varied independently.

Available reasoners

Included in the Jena distribution are a number of predefined reasoners, at various stages of maturity. In the Jena2 preview 4 distribution the following reasoners are worth noting:

Transitive reasoner: Provides support for storing and traversing class and property lattices. This implements just the transitive and symmetric properties of rdfs:subPropertyOf and rdfs:subClassOf.
RDFS rule reasoner: Implements a configurable subset of the RDFS entailments.
Generic rule reasoner: A rule based reasoner that supports user defined rules. Forward chaining, tabled backward chaining and hybrid execution strategies are supported.
OWL FB Reasoner: A preliminary implementation of the OWL/Lite subset of the OWL/Full language.

Generic reasoner API

Finding a reasoner

For each type of reasoner there is a factory class (which conforms to the interface ReasonerFactory) which has one or more instances through which instances of the Reasoner can be constructed. The factory instances can be located by going directly to a known factory class and finding and use theInstance() method or by retrieval from a global ReasonerRegistry which stores factory instances indexed by URI assigned to the reasoner.

In addition, there are convenience methods on the ReasonerRegistry for locating a prebuilt instance of each of the main reasoners (getTransitiveReasoner, getRDFSReasoner, getOWLReasoner, getRDFSSimpleReasoner).

Finally the accessAPIs include convenience support for creating Models or Ontology models with supporting reasoners. See ModelFactory.createRDFSModel and [TODO: cross ref to ontology documentation].

Configuring a reasoner

The behaviour of many of the reasoners can be configured. To allow arbitrary configuration information to be passed to reasoners we use RDF to encode the configuration information. The ReasonerFactory.create method can be passed a Jena Resource object, the properties of that object will be used to configure the created reasoner.

To simplify the code required for simple cases we also provide a direct Java method to set a single configuration parameter, Reasoner.setParameter. The parameter being set is identified by the URI of the corresponding configuration property.

For the built in reasoners the available configuration parameters are described below [TODO: make it so] and are predefined in the ReasonerVocabulary class.

Applying a reasoner to data

Once you have an instance of a reasoner it can then be attached to a graph of RDF data to create an inference graph. This can either be done by putting all the RDF data into one Graph/Model or by separating into two components - schema and instance data. For some external reasoners a hard separation may be required. For all of the built in reasoners the separation is arbitrary. The prime value of this separation is the allow some deductions from one set of data (typically some schema definitions) to be efficiently applied to several subsidiary sets of data (typically sets of instance data).

At the SPI level the methods Reasoner.bindSchema and Reasoner.bind perform this function. These operations are side-effect free so that a single reasoner instance can be used to generate arbitrary bound inference graphs without problems.

At the API level the method ModelFactory.createInfModel does this.

Accessing inferences

Finally, having created a inference model then any API operations which access RDF statements will be able to access additional statements which are entailed from the bound data by means of the reasoner.

Reasoner description

The reasoners can be described using RDF metadata which can be searched to locate reasoners with appropriate properties. The calls Reasoner.getCapabilities and Reasoner.supportsProperty are used to access this descriptive metadata.

Some small examples

These initial examples are not designed to illustrate the power of the reasoners but to illustrate the code required to set one up.

Let us first create a Jena model containing the statements that some property "p" is a subproperty of another property "q" and that we have a resource "a" with value "foo" for "p". This could be done by writing an RDF/XML or N3 file and reading that in but we have chosen to use the RDF API:

        String NS = "urn:x-hp-jena:eg/";
        
        // Build a trivial example data set
        Model rdfsExample = ModelFactory.createDefaultModel();
        Property p = rdfsExample.createProperty(NS, "p");
        Property q = rdfsExample.createProperty(NS, "q");
        rdfsExample.add(p, RDFS.subPropertyOf, q);
        rdfsExample.createResource(NS+"a").addProperty(p, "foo");

Now we can create an inference model which performs RDFS inference over this data by using:

        InfModel inf = ModelFactory.createRDFSModel(rdfsExample);  // [1]

We can then check that resulting model shows that "a" also has property "q" of value "foo" by virtue of the subPropertyOf entailment:

        Resource a = inf.getResource(NS+"a");
        System.out.println("Statement: " + a.getProperty(q));

Which prints the output:

        Statement: [urn:x-hp-jena:eg/a, urn:x-hp-jena:eg/q, Literal]

Alternatively we could have created an empty inference model and then added in the statements directly to that model.

If we wanted to use a different reasoner which is not available as a convenience method or wanted to configure one we would change line [1].

To create the same set up manually we could replace [1] by:

        Reasoner reasoner = RDFSRuleReasonerFactory.theInstance().create(null);
        InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample);

Then we could set properties of the reasoner before use. For example, if we were to listStatements on inf Model we would see that it also "includes" all the RDFS axioms, of which there are quite a lot. It is sometimes useful to suppress these and only see the "interesting" entailments. This can be done by setting the processing level parameter by creating a description of a new reasoner configuration and passing that to the factory method:

        Resource config = ModelFactory.createDefaultModel()
                          .createResource()
                          .addProperty(ReasonerVocabulary.PROPsetRDFSLevel, "simple");
        Reasoner reasoner = RDFSRuleReasonerFactory.theInstance()Create(config);
        InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample);

This is a rather long winded way of setting a single parameter, though it can be useful in the cases where you want to store this sort of configuration information in a separate (RDF) configuration file. For hardwired uses the following alternative is often simpler:

        Reasoner reasoner = RDFSRuleReasonerFactory.theInstance()Create(null);
        reasoner.setParameter(ReasonerVocabulary.PROPsetRDFSLevel.getURI(), 
                              ReasonerVocabulary.RDFS_SIMPLE);
        InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample);

Finally, supposing you have a more complex set of schema information in defined in a Model called schema and you want to apply this schema to several sets of instance data without redoing too many of the same intermediate deductions. This can be done by using the SPI level methods:

        Reasoner boundReasoner = reasoner.bindSchema(schema.getGraph());
        InfModel inf = ModelFactory.createInfModel(boundReasoner, data);

This creates an new reasoner, independent from the original, which contains the schema data. Any queries to an InfModel created using the boundReasoner will see the schema statements, the data statements and any statements entailed from the combination of the two.

Transitive reasoner

The TransitiveReasoner provides support for storing and traversing class and property lattices. This implements just the transitive and symmetric properties of rdfs:subPropertyOf and rdfs:subClassOf. It is not all that exciting on its own but is one of the building blocks used for the more complex reasoners. The functionality it offers is roughly equivalent to the hardwired inferences available through the Jena1 DAML API.

It has no configuration options.

RDFSRuleReasoner

The RDSFRuleReasoner implements the RDFS entailment rules by using a combination of the TransitiveReasoner and a generate rule based engine.

The rules are defined in a text file located on the classpath and it is possible to construct new RDFS engines with different rule variants.

The most important configuration parameter is the RDFSLevel parameter illustrated in the example above. The levels are:

Full: This implements all of the RDFS axioms and closure rules with the exception of bNode entailments and datatypes (rdfD 1). See below for comments on these. This is an expensive mode because all statements in the data graph need to be checked for possible use of container membership properties. It also generates type assertions for all resources and properties mentioned in the data (rdf1, rdfs4a, rdfs4b).
Default: This omits the expensive checks for container membership properties and the "everything is a resource" and "everything used as a property is one" rules (rdf1, rdfs4a, rdfs4b). The latter information is available through the Jena API and creating virtual triples to this effect has little practical value.
Simple: This implements just the transitive closure of subPropertyOf and subClassOf relations, the domain and range entailments and the implications of subPropertyOf and subClassOf. It omits all of the axioms. This is probably the most useful mode but is not the default because it is a less complete implementation of the standard.

[TODO: more detail required on this either here or in a later section and expand on all the cryptic references like rdfD-1]

Note: In the Jena Model API we do not treat bNodes as variables. This enables code to, for example, retrieve the properties of specific bNodes from a Model. In order to determine if Model B is entailed by Model A one would need to translate model B in a query replacing bNodes by query variables and then apply the query to Model A. This will be expensive for large B's.

Given this property that bNodes have real existence as Java objects in a Jena Model, and given that the reasoners are intended to be access through the normal Model API, we decided that implementing the RDFS closure rules that generate bNodes for all literals in the graph would be confusing and pointless. So we didn't.

Similarly, the Jena API allows provides direct means to discover the datatype of a typed literal and reflecting this through the rdfD-1 closure rule is pointless.

GenericRuleReasoner

The RDFS (and indeed OWL) reasoner is built using a general purpose rule engine. This engine supports forward chaining rules, tabled (aka memoized) backward chaining rules or a hybrid mode in which forward rules can create new backward rules that are used to answer queries.

This engine is available for general use in applications that wish to perform rule based processing of RDF data.

At this release the engine is exposed via the GenericRuleReasoner and is complete enough to support the RDFS reasoner but is not yet stable. In particular the backward chaining component is being revised. As the RuleReasoner stabilizes will generate documentation and sample code for how to use it in applications. At the moment rules can only be created either programmatically or via a text format, though in principle it would be possible to define an RDF encoding for them.

[TODO lots more details]

OWLFBReasoner

We also include an early prototype of a rule-based implementation of an OWL/Lite reasoner - OWLFBReasoner. The current implementation is sufficient to pass the relevant core normative WebONT test cases. On such small scale test cases, where the query to the inference graph is a test for a ground triple, performance is good. However, we have concerns over scalability of the current solution and intend to do further performance work and validation before recommending this reasoner for use.

[TODO lots more details]

Advanced usage

TODO. Describe:

query with premises using the 4 arg version of listStatements
validation
access to derivation information
control of when the expensive processing happens

Extension point - the reasoner registry

The Jena reasoner API is intended to support plug in access to appropriate external reasoners and we plan to construct example adapters for one or two openly available reasoners. This work has not yet been done and no external reasoner support is included within this release.

[TODO lots more details]