Jena schemagen HOWTO

The schemagen provided with Jena is used to convert an OWL, DAML or RDFS vocabulary into a Java class file that contains static constants for the terms in the vocabulary. This documents outlines the use of schemagen, and the various options and templates that may be used to control the output.

Schemagen is typeically invoked from the command line or from a built script (such as Ant). Synopsis of the command:

java jena.schemagen -i <input> [-a <namespaceURI>] [-o <output file>] [-c <config uri>] [-s <syntax>] ...
Schemagen is highly configurable, either with command line options or by RDF information read from a configuration file. Many other options are defined, and these are described in detail below. Note that the CLASSPATH environment variable must be set to include the Jena .jar libraries.

Summary of configuration options

For quick reference, here is a list of all of the schemagen options (both command line and configuration file). The use of these options is explained in detail below.

Table 1: schemagen options
Command line optionRDF config file propertyMeaning
-a <uri> sgen:namespace The namespace URI for the vocabulary. Names with this URI as prefix are automatically included in the generated vocabulary.
-c <filename>
-c <url>
  Specify an alternative config file.
--classdec <string> sgen:classdec Additional decoration for class header (such as implements)
--classnamesuffix <string> sgen:classnamesuffix Option for adding a suffix to the generated class name, e.g. "Vocab".
--classSection <string> sgen:classSection Section declaration comment for class section.
--classTemplate <string> sgen:classTemplate Template for writing out declarations of class resources.
--daml sgen:daml Specify that the language of the source ontology is DAML+OIL.
--declarations <string> sgen:declarations Additional declarations to add at the top of the class.
--footer <string> sgen:footer Template for standard text to add to the end of the file.
--header <string> sgen:header Template for the file header, including the class comment.
-i <filename>
-i <url>
sgen:input Specify the input document to load
--include <uri> sgen:include Option for including non-local URI's in vocabulary
--individualsSection <string> sgen:individualsSection Section declaration comment for individuals section.
--individualTemplate <string> sgen:individualTemplate Template for writing out declarations of individuals.
--marker <string> sgen:marker Specify the marker string for substitutions, default is '%'
-n <string> sgen:classname The name of the generated class. The default is to synthesise a name based on input document name.
--noclasses sgen:noclasses Option to suppress classes in the generated vocabulary file
--nocomments sgen:noComments Turn off all comment output in the generated vocabulary
--noheader sgen:noHeader Prevent the output of a file header, with class comment etc.
--noindividuals sgen:noindividuals Option to suppress individuals in the generated vocabulary file.
--noproperties sgen:noproperties Option to suppress properties in the generated vocabulary file.
-o <filename>
-o <dir>
sgen:output Specify the destination for the output. If the given value evaluates to a directory, the generated class will be placed in that directory with a file name formed from the generated (or given) class name with ".java" appended.
--ontology sgen:ontology The generated vocabulary will use the ontology API terms, in preference to RDF model API terms.
--owl sgen:owl Specify that the language of the source is OWL (the default). Note that RDFS is a subset of OWL, so this setting also suffices for RDFS.
--package <string> sgen:package Specify the Java package name.
--propSection <string> sgen:propSection Section declaration comment for properties section.
--propTemplate <string> sgen:propTemplate Template for writing out declarations of property resources.
-r <uri>   Specify the uri of the root node in the RDF configuration model.
-s <string> sgen:syntax The surface syntax of the input file (e.g. RDF/XML, N3). Defaults to RDF/XML.
--uppercase sgen:uppercase Option for mapping constant names to uppercase (like Java constants). Default is to leave the case of names unchanged.

What does schemagen do?

RDFS, OWL and DAML+OIL provide a very convenient means to define a controlled vocabulary or ontology. For general ontology processing, Jena provides various API's to allow the source files to be read in and manipulated. However, when developing an application, it is frequently convenient to refer to the controlled vocabulary terms directly from Java code. This leads typically to the declaration of constants, such as:

    public static final Resource A_CLASS = new ResourceImpl( "http://example.org/schemas#a-class" );
When these constants are defined manually, it is tedious and error-prone to maintain them in synch with the source ontology file. Schemagen automates the production of Java constants that correspond to terms in an ontology document. By automating the step from source vocabulary to Java constants, a source of error and inconsistency is removed.

Example

Perhaps the easiest way to explain the detail of what schemagen does is to show an example. Consider the following mini-RDF vocabulary:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
            xmlns="http://example.org/eg#"
         xml:base="http://example.org/eg">
  <rdfs:Class rdf:ID="Dog">
      <rdfs:comment>A class of canine companions</rdfs:comment>
  </rdfs:Class>
  <rdf:Property rdf:ID="petName">
      <rdfs:comment>The name that everyone calls a dog</rdfs:comment>
      <rdfs:domain rdf:resource="http://example.org/eg#Dog" />
  </rdf:Property>
  <rdf:Property rdf:ID="kennelName">
      <rdfs:comment>Posh dogs have a formal name on their KC certificate</rdfs:comment>
  </rdf:Property>
  <Dog rdf:ID="deputy">
      <rdfs:comment>Deputy is a particular Dog</rdfs:comment>
      <kennelName>Deputy Dawg of Chilcompton</kennelName>
  </Dog>
</rdf:RDF>

We process this document with a command something like:
java jena.schemagen -i deputy.rdf -b http://example.org/eg#
to produce the following generated class:

/* CVS $Id: schemagen.html,v 1.3 2003-08-28 11:29:05 andy_seaborne Exp $ */

import com.hp.hpl.jena.rdf.model.*;

/**
 * Vocabulary definitions from deputy.rdf
 * @author Auto-generated by schemagen on 01 May 2003 21:49
 */
public class Deputy {
    /** <p>The RDF model that holds the vocabulary terms</p> */
    private static Model m_model = ModelFactory.createDefaultModel();
    
    /** <p>The namespace of the vocabalary as a string {@value}</p> */
    public static final String NS = "http://example.org/eg#";
    
    /** <p>The namespace of the vocabalary as a resource {@value}</p> */
    public static final Resource NAMESPACE = m_model.createResource( "http://example.org/eg#" );
    
    /** <p>The name that everyone calls a dog</p> */
    public static final Property petName = m_model.createProperty( "http://example.org/eg#petName" );
    
    /** <p>Posh dogs have a formal name on their KC certificate</p> */
    public static final Property kennelName = m_model.createProperty( "http://example.org/eg#kennelName" );
    
    /** <p>A class of canine companions</p> */
    public static final Resource Dog = m_model.createResource( "http://example.org/eg#Dog" );
    
    /** <p>Deputy is a particular Dog</p> */
    public static final Resource deputy = m_model.createResource( "http://example.org/eg#deputy" );
    
}

Some things to note in this example. All of the named classes, properties and individuals from the source document are translated to Java constants (below we show how to be more selective than this). The properties of the named resources are not translated: schemagen is for giving access to the names in the vocabulary or schema, not to perform a general translation of RDF to Java. The RDFS comments from the source code are translated to Javadoc comments. Finally, we no longer directly call new ResourceImpl: this idiom is no longer recommended by the Jena team.

We noted earlier that schemagen is highly configurable. One additional argument generates a vocabulary file that uses Jena's ontology API, rather than the RDF model API. We change rdfs:Class to owl:Class, and invoke
java jena.schemagen -i deputy.rdf -b http://example.org/eg# --ontology
to get:

/* CVS $Id: schemagen.html,v 1.3 2003-08-28 11:29:05 andy_seaborne Exp $ */

import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.ontology.*;
/**
 * Vocabulary definitions from deputy.rdf
 * @author Auto-generated by schemagen on 01 May 2003 22:03
 */
public class Deputy {
    /** <p>The ontology model that holds the vocabulary terms</p> */
    private static OntModel m_model = ModelFactory.createOntologyModel( ProfileRegistry.OWL_LANG );
    
    /** <p>The namespace of the vocabalary as a string {@value}</p> */
    public static final String NS = "http://example.org/eg#";
    
    /** <p>The namespace of the vocabalary as a resource {@value}</p> */
    public static final Resource NAMESPACE = m_model.createResource( "http://example.org/eg#" );
    
    /** <p>The name that everyone calls a dog</p> */
    public static final Property petName = m_model.createProperty( "http://example.org/eg#petName" );
    
    /** <p>Posh dogs have a formal name on their KC certificate</p> */
    public static final Property kennelName = m_model.createProperty( "http://example.org/eg#kennelName" );
    
    /** <p>A class of canine companions</p> */
    public static final OntClass Dog = m_model.createClass( "http://example.org/eg#Dog" );
    
    /** <p>Deputy is a particular Dog</p> */
    public static final Individual deputy = m_model.createIndividual( Dog, "http://example.org/eg#deputy" );
    
}

General principles

In essence, schemagen will load a single vocabulary file (imports processing is switched off in DAML and OWL), and generate a Java class that contains static constants for the named classes, properties and instances of the vocabulary. Most of the generated components of the output Java file can be controlled by option flags, and formatted with a template. Default templates are provided for all elements, so the minimum amount of necessary information is actually very small.

Options can be specified on the command line (when invoking schemagen), or may be preset in an RDF file. Any mixture of command line and RDF option specification is permitted. Where a given option is specified both in an RDF file and on the command line, the command line setting takes precedence. Thus the options in the RDF file can be seen as defaults.

Specifying command line options

To specify a command line option, add its name (and optional value) to the command line when invoking the schemagen tool. E.g:
java jena.schemagen -i myvocab.owl --ontology --uppercase

Specifying options in an RDF file

To specify an option in an RDF file, create a resource of type sgen:Config, with properties corresponding to the option names listed in Table 1. The following fragment shows a small options file. A complete example configuration file is shown in appendix A.

By default, schemagen will look for a configuration file named schemagen.rdf in the current directory. To specify another configuration, use the -c option with a URL to reference the configuration. Multiple configurations (i.e. multiple sgen:Config nodes) can be placed in one RDF document. In this case, each configuration node must be named, and the URI specified in the -r command line option. If there is no -r option, schemagen will look for a node of type rdf:type sgen:Config. If there are multiple such nodes in the model, it is indeterminite which one will be used.

Using templates

We have several times referred to a template being used to construct part of the generated file. What is a template? Simply put, it is a fragment of output file. Some templates will be used at most once (for example the file header template), some will be used many times (such as the template used to generate a class constant). In order to make the templates adaptable to the job they're doing, before it is written out a template has keyword substitution performed on it. This looks for certain keywords delimited by a pair of special characters (% by default), and replaces them with the current binding for that keyword. Some keyword bindings stay the same throughout the processing of the file, and some are dependent on the language element being processed. The substitutions are:

Table 2: Substitutable keywords in templates
Keyword Meaning Typical value
classname The name of the Java class being generated Automatically defined from the document name, or given with the -n option
date The date and time the class was generated
imports The Java imports for this class
nl The newline character for the current platform
package The Java package name As specified by an option. The option just gives the package name, schema gen turns the name into a legal Java statement.
sourceURI The source of the document being processed As given by the -i option or in the config file.
valclass The Java class of the value being defined E.g. Property for vocabulary properties, Resource for classes in RDFS, or OntClass for classes using the ontology API
valcreator The method used to generate an instance of the Java representation E.g. createResource or createClass
valname The name of the Java constant being generated This is generated from the name of the resource in the source file, adjusted to be a legal Java identifier. By default, this will preserve the case of the RDF constant, but setting --uppercase will map all constants to upper-case names (a common convention in Java code).
valtype The rdf:type for an individual The class name or URI used when creating an individual in the ontology API
valuri The full URI of the value being defined From the RDF, without adjustment.

Details of schemagen options

We now go through each of the configuration options in detail.

@@TODO@@

Appendix A: Complete example configuration file

The source of this example is provided in the Jena download as etc/schemagen.rdf.

<?xml version='1.0'?>

<!DOCTYPE rdf:RDF [
    <!ENTITY jena    'http://jena.hpl.hp.com/'>

    <!ENTITY rdf     'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <!ENTITY rdfs    'http://www.w3.org/2000/01/rdf-schema#'>
    <!ENTITY owl     'http://www.w3.org/2002/07/owl#'>
    <!ENTITY xsd     'http://www.w3.org/2001/XMLSchema#'>
    <!ENTITY base    '&jena;2003/04/schemagen'>
    <!ENTITY sgen    '&base;#'>
]>

<rdf:RDF
  xmlns:rdf   ="&rdf;"
  xmlns:rdfs  ="&rdfs;"
  xmlns:owl   ="&owl;"
  xmlns:vocab ="&vocab;"
  xmlns       ="&vocab;"
  xml:base    ="&base;"
>

<!--
	Example schemagen configuration for use with jena.schemagen
    Not all possible options are used in this example, see Javadoc and Howto for full details.

	Author: Ian Dickinson, mailto:ian.dickinson@hp.com
	CVS:    $Id: schemagen.html,v 1.3 2003-08-28 11:29:05 andy_seaborne Exp $
-->

<sgen:Config>
    <!-- specifies that the  source document uses OWL -->
    <sgen:owl rdf:datatype="&xsd;boolean">true</sgen:owl>

    <!-- specifies that we want the generated vocab to use OntClass, OntProperty, etc, not Resource and Property -->
    <sgen:ontology rdf:datatype="&xsd;boolean">true</sgen:ontology>

    <!-- specifies that we want names mapped to uppercase (as standard Java constants) -->
    <sgen:uppercase rdf:datatype="&xsd;boolean">true</sgen:uppercase>

    <!-- append Vocab to class name, so input beer.owl becomes BeerVocab.java -->
    <sgen:classnamesuffix rdf:datatype="&xsd;string">Vocab</sgen:classnamesuffix>

    <!-- the java package that the vocabulary is in -->
    <sgen:package rdf:datatype="&xsd;string">com.example.vocabulary</sgen:package>

    <!-- the directory or file to write the results out to -->
    <sgen:output rdf:datatype="&xsd;string">src/com/example/vocabulary</sgen:output>

    <!-- the template for the file header -->
<sgen:header rdf:datatype="&xsd;string">/*****************************************************************************
 * Source code information
 * -----------------------
 * Original author    Jane Smart, example.com
 * Author email       jane.smart@example.com
 * Package            @package@
 * Web site           @website@
 * Created            %date%
 * Filename           $RCSfile: schemagen.html,v $
 * Revision           $Revision: 1.3 $
 * Release status     @releaseStatus@ $State: Exp $
 *
 * Last modified on   $Date: 2003-08-28 11:29:05 $
 *               by   $Author: andy_seaborne $
 *
 * @copyright@
 *****************************************************************************/


// Package
///////////////////////////////////////
%package%


// Imports
///////////////////////////////////////
%imports%



/**
 * Vocabulary definitions from %sourceURI%
 * @author Auto-generated by schemagen on %date%
 */</sgen:header>

<!-- the template for the file footer (note @footer@ is an ant-ism, will not be processed by VocabGen) -->
<sgen:footer rdf:datatype="&xsd;string">
/*
@footer@
*/
</sgen:footer>

<!-- template for extra declarations at the top of the class file -->
<sgen:declarations rdf:datatype="&xsd;string">
    /** Factory for generating symbols */
    private static KsValueFactory s_vf = new DefaultValueFactory();
</sgen:declarations>

<!-- template for introducing the properties in the vocabulary -->
<sgen:propSection rdf:datatype="&xsd;string">
    // Vocabulary properties
    ///////////////////////////
</sgen:propSection>

<!-- template for introducing the classes in the vocabulary -->
<sgen:classSection rdf:datatype="&xsd;string">
    // Vocabulary classes
    ///////////////////////////
</sgen:classSection>

<!-- template for introducing the individuals in the vocabulary -->
<sgen:individualsSection rdf:datatype="&xsd;string">
    // Vocabulary individuals
    ///////////////////////////
</sgen:individualsSection>

<!-- template for doing fancy declarations of individuals -->
<sgen:individualTemplate rdf:datatype="&xsd;string">public static final KsSymbol %valname% = s_vf.newSymbol( "%valuri%" );

    /** Ontology individual corresponding to {@link #%valname%} */
    public static final %valclass% _%valname% = m_model.%valcreator%( %valtype%, "%valuri%" );
</sgen:individualTemplate>

</sgen:Config>

</rdf:RDF>


CVS $Id: schemagen.html,v 1.3 2003-08-28 11:29:05 andy_seaborne Exp $