Jena2 Database Interface - Preliminary Release Notes

27 June, 2003

The jena/db module provides an implementation of the Jena model interface but with the ability to store and retrieve RDF statements using a database. A preliminary release is currently available for experimentation and feedback. However, it is missing some features and should not be considered stable. Databases created with this preliminary release may not be accessible in a later preliminary release or in the final, full featured Jena2 release.

Contents

Features
Remaining Work
Database Engines Supported
Compatibility with Jena1 ModelRDB
Migrating Jena1 Applications and Databases
Performance Notes

Features

The jena/db module provides persistence for asserted statements. It uses a different layout than Jena1 that enables faster insertion and retrieval but uses more storage than Jena1. It is largely backwards-compatible for Jena1 applications with the exception of some database configuration options. This preliminary release only runs on MySQL.

Remaining Work

The following work, with the possible exception of application-specific layouts, will be completed for the final Jena2 release. In the meantime, the code base for persistence is a  work-in-progress and liable to change. However, experimentation and feedback is encouraged and welcome.

Note that significant code restructuring is anticipated to support these new capabilities. These changes should be transparent to applications with the following caveat. The database layouts (the schema) should not be considered stable. Databases created with one preliminary release of jena.db.ModelRDB may not be accessible with a later release.

Database Engine Support

The following table lists the platforms, database engines and JDBC drivers currently supported for Jena2 persistence. Older and newer versions may work but have not been tested.

   Platform        Database Engine        JDBC Driver
   Windows 2000        MySQL 4.0.12    mysql-connector-java-3.0.7-stable.jar    
   Linux (RedHat 7.2)        MySQL 4.0.12    mysql-connector-java-3.0.7-stable.jar

Compatibility with Jena1 ModelRDB and Databases

In general, Jena2 supports backwards compatibility for the Jena1 applications using the ModelRDB class. However, the Jena1 databases themselves are not compatible and cannot be directly read by Jena2. Instructions on migrating Jena1 databases to Jena2 are given below.

There are some changes to the API. Some ModelRDB constructors are deprecated and applications should consider migrating to new factory methods for creating and opening persistent models (see below). The ModelRDB package name has changed. Jena1 applications that directly reference the package name jena.rdb must be modified to reference the package name jena.db. Jena2 does not support the StoreRDB class nor any of the Jena1 customization parameters (setProperty, getProperty). Jena2 uses a different technique for database configuration.

Jena2 does not support the hash layouts and proc layouts of Jena1. Applications that request these layouts under Jena2 will be given a generic layout. Jena2 will initially support MySQL, Oracle, Postgresql and Berkeley DB. Applications that require Interbase will not work. The driver configuration files (e.g., Mysql.config) are no longer used. Instead, configuration options are set and retrieved using statements in an RDF memory model.

In Jena2, all databases are multi-model. However, by default, each model is stored in separate tables. To share tables among models, see migrating, below.

Performance of Jena2 persistent models is no worse than Jena1 and often better. However, Jena2 persistent models may consume more database space. See the Performance Notes.

Migrating Jena1 Applications and Databases to Jena2

As mentioned above, most Jena1 persistent applications should run with little or no modification under Jena2. However, some ModelRDB class constructors are deprecated. In Jena2, persistent models should be created using a factory method.

The Jena2 persistence architecture and layout are different from Jena1. However, these differences are largely transparent to applications and only affect code that creates new persistent models. In particular, the way in which database configuration options are specified is changed. In Jena2, configuration options are specified as statements in an RDF memory model. The statements describe properties of the model, such as the database type and the database layout. Some properties are part of the database and some are part of the connection. See ModelDB.getModelProperties and DBConnection.getDatabaseProperties.

The vocabulary for these properties is given in jena.vocabulary.DB.  Note that only a few configuration parameters are defined in the default configuration property model provided in this preview release. This was done for experimentation. A richer set of configuration properties will be provided in the final release. A configuration property is used to enable models to share database tables. By default, each model is stored in its own tables. To enable graphs to share tables, add the property DB.graphDBSchema. to the configuration model. The property value should be the name of some other existing model, or "DEFAULT" to share the tables with the default model. For an example of how this is done, see the test program testConstructNamedModelSchema in jena.db.test.testConnection.

To migrate databases, a small Jena1 application program is used to write Jena1 ModelRDB contents to a text file using an RDF writer. A small Jena2 application program is then used to read this file and store it in a ModelDB model. For small databases, the PrettyWriter (RDF/XML-ABBREV) should be adequate. For medium or large databases, use the N-Triple writer for better performance and consider using a pipe to connect the two small applications rather than using an intermediate file. As a convenience, the preview release will provide a script or utility program for database migration.

Performance Notes

Very little tuning or performance analysis has been done. Early measurements of the Jena2 preview 1 release using the jena-perf benchmark (available as a package from http://sourceforge.net/projects/jena/) show Jena2 to be up to three times faster than Jena1 (see perftest.html). However, Jena2 models may consume more disk space than Jena1 models. Future preliminary releases will provide more guidance on the space differences. In the meantime, do not be surprised to find significant size differences between Jena1 and Jena2. Our goal is that that Jena2 performance be comparable to Jena1. The jena-perf results given here are primarily intended for comparing different Jena configurations and releases. These results should not be considered indicative of real Jena applications.

These differences are due to the different table layouts used in Jena1 and Jena2. In Jena1, resources were stored in a separate table that was referenced from the statement table. Similarly for literals. This greatly reduced space consumption since resources and literals that appeared in multiple statements were only stored once. However, retrieving a statement required joining the statement table with the literals and resources tables.

In Jena2, resources are stored directly in the statement table. Also stored in the statement table are plain literals, i.e., literals that have no language tag, no datatype and are not large. Non-plain literals are stored in a literals table. Consequently, Jena2 requires fewer join operations than Jena1 at the expense of more memory since a resource or literal may be stored multiple times. For the final release, Jena2 will support namespace compression for the statement table which should reduce space consumption.