sparql2sql – a query engine for SPARQL over Jena triple stores

Last update: June 15th 2005

Overview

sparql2sql is a query engine for SPARQL over Jena triple stores. It rewrites SPARQL queries into SQL. This approach offloads most of the query execution work on the database. This should improve performance.

This is an experimental implementation. It cannot deal with all SPARQL queries and is not fully tested. See the Limitations and known issues sections for some details.

Please direct feedback and bug reports to the Jena mailing list, jena-dev@groups.yahoo.com.

Author: Richard Cyganiak (richard@cyganiak.de)

Contents

  1. Download and CVS access
  2. Example: Querying a persistent Jena model
  3. Example: Working with RDF Datasets and named graphs
  4. Limitations and known issues
  5. Database schema
  6. SPARQL to SQL mapping details

Download and CVS access

Currently sparql2sql is only available as Java source code from CVS.

cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/jena login
cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/jena co sparql2sql

When asked for a password, just press Enter.

All required jar files (the Jena 2.2 jars, the MySQL JDBC connector, and a CVS build of ARQ) are in the lib directory.

There's a runnable example, sparql2sql/Test.java, and a unit test suite in the tests-src directory. Both require a live MySQL 4.1 database. The connection is configured in etc/db_connection.properties.

Example: Querying a persistent Jena model

sparql2sql can be used to query database-persisted Jena models (ModelRDB). The example creates a ModelRDB, reads an RDF file into the model, then re-opens the model as an RDBDataSource and executes a SPARQL query on that.

// register the sparql2sql query engine
// (must be done once at startup time)
RDBQueryEngineFactory.registerSelf();

// Open a DB connection and DB model
IDBConnection conn = new DBConnection(url, user, password, engine);
ModelMaker maker = ModelFactory.createModelRDBMaker(conn);
Model persistentModel = maker.createModel("myModelName");

// ... do interesting stuff with the model ...
persistentModel.read("http://xmlns.com/foaf/0.1/index.rdf");

// Open the same model as an ARQ DataSet
DataSet ds = RDBDataSource.open(conn, "myModelName");

// Execute a SPARQL query
String sparql =
	"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
	"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> " +
	"SELECT ?class ?label " +
	"WHERE { ?class rdf:type rdfs:Class . " +
	"        ?class rdfs:label ?label }";
ResultSet results = QueryExecutionFactory.create(
		QueryFactory.create(sparql), ds).execSelect();

// Pretty-print results to System.out
new ResultSetFormatter(results).printAll(System.out);

Example: Working with RDF Datasets and named graphs

SPARQL's Dataset is a collection consisting of a default graph and any number of named graphs, which are named by URIs.

sparql2sql's implementation of this concept is the RDBDataSource.

The example sets up an RDBDataSource, reads some RDF file into the default graph and some named graphs, and executes a SPARQL query over the Dataset.

// set up datasource
RDBDataSource ds = RDBDataSource.open(
		new DBConnection(url, user, password, engine),
		"my_dataset");

// clean the model if it still contains stuff from previous run
ds.clear();

// randomly read some RDF into the default and some named graphs
ds.getDefaultModel().read("http://www.w3.org/1999/02/22-rdf-syntax-ns");
// we have to generate the named graphs first -- clunky!
ds.addNamedModel("urn:my:graph1", ModelFactory.createDefaultModel());
ds.addNamedModel("urn:my:graph2", ModelFactory.createDefaultModel());
ds.addNamedModel("urn:my:graph3", ModelFactory.createDefaultModel());
// now read some stuff
ds.getNamedModel("urn:my:graph1").read("http://www.w3.org/2000/01/rdf-schema");
ds.getNamedModel("urn:my:graph2").read("http://purl.org/dc/elements/1.1/");
ds.getNamedModel("urn:my:graph3").read("http://xmlns.com/foaf/0.1/index.rdf");

// register the SPARQL2SQL query engine -- must be done once at
// startup time
RDBQueryEngineFactory.registerSelf();

// Set log level to debug
// This causes the engine to log executed SELECT statements
Logger.getLogger(RDBDataSource.class).setLevel(Level.DEBUG);

// do a SPARQL query
String sparql =
	"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
	"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> " +
	"SELECT ?source ?uri ?superclass " +
	"WHERE { GRAPH ?source { " +
	"{ ?uri rdf:type rdfs:Class } UNION { ?uri rdf:type rdf:Property } " +
	"OPTIONAL { ?uri rdfs:subClassOf ?superclass } } }";
Query q = QueryFactory.create(sparql);
ResultSet results = QueryExecutionFactory.create(q, ds).execSelect();

// print results using an ARQ utility class
ResultSetFormatter.out(System.out, results, q);

// close the dataset
ds.close();

Limitations and known issues

This is experimental software in a very early stage of development. No extensive testing has been performed.

Database schema

sparql2sql uses the Jena ModelRDB database schema.

This allows SPARQL queries over existing ModelRDB stores, but comes at a performance and complexity cost since the Jena DB schema was not designed with RDF Datasets in mind.

ModelRDB is able to store multiple models in a single statement table. This feature is used by sparql2sql to simulate RDF Datasets. The model ID is used to store graph name URIs. The URIs are encoded using ModelRDB's node encoding scheme to improve join performance.

SPARQL to SQL mapping details

Generated SQL statements can be logged by lowering the log level:

Logger.getLogger(RDBDataSource.class).setLevel(Level.DEBUG);