Last update: June 15th 2005
sparql2sql is a query engine for SPARQL over Jena triple stores. It rewrites SPARQL queries into SQL. This approach offloads most of the query execution work on the database. This should improve performance.
This is an experimental implementation. It cannot deal with all SPARQL queries and is not fully tested. See the Limitations and known issues sections for some details.
Please direct feedback and bug reports to the Jena mailing list, jena-dev@groups.yahoo.com.
Author: Richard Cyganiak (richard@cyganiak.de)
Currently sparql2sql is only available as Java source code from CVS.
cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/jena login cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/jena co sparql2sql
When asked for a password, just press Enter.
All required jar files (the Jena 2.2 jars, the MySQL JDBC connector, and a CVS build of ARQ) are in the lib directory.
There's a runnable example, sparql2sql/Test.java, and a unit test suite in the tests-src directory. Both require a live MySQL 4.1 database. The connection is configured in etc/db_connection.properties.
sparql2sql can be used to query database-persisted Jena models (ModelRDB). The example creates a ModelRDB, reads an RDF file into the model, then re-opens the model as an RDBDataSource and executes a SPARQL query on that.
// register the sparql2sql query engine // (must be done once at startup time) RDBQueryEngineFactory.registerSelf(); // Open a DB connection and DB model IDBConnection conn = new DBConnection(url, user, password, engine); ModelMaker maker = ModelFactory.createModelRDBMaker(conn); Model persistentModel = maker.createModel("myModelName"); // ... do interesting stuff with the model ... persistentModel.read("http://xmlns.com/foaf/0.1/index.rdf"); // Open the same model as an ARQ DataSet DataSet ds = RDBDataSource.open(conn, "myModelName"); // Execute a SPARQL query String sparql = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " + "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> " + "SELECT ?class ?label " + "WHERE { ?class rdf:type rdfs:Class . " + " ?class rdfs:label ?label }"; ResultSet results = QueryExecutionFactory.create( QueryFactory.create(sparql), ds).execSelect(); // Pretty-print results to System.out new ResultSetFormatter(results).printAll(System.out);
SPARQL's Dataset is a collection consisting of a default graph and any number of named graphs, which are named by URIs.
sparql2sql's implementation of this concept is the RDBDataSource.
The example sets up an RDBDataSource, reads some RDF file into the default graph and some named graphs, and executes a SPARQL query over the Dataset.
// set up datasource RDBDataSource ds = RDBDataSource.open( new DBConnection(url, user, password, engine), "my_dataset"); // clean the model if it still contains stuff from previous run ds.clear(); // randomly read some RDF into the default and some named graphs ds.getDefaultModel().read("http://www.w3.org/1999/02/22-rdf-syntax-ns"); // we have to generate the named graphs first -- clunky! ds.addNamedModel("urn:my:graph1", ModelFactory.createDefaultModel()); ds.addNamedModel("urn:my:graph2", ModelFactory.createDefaultModel()); ds.addNamedModel("urn:my:graph3", ModelFactory.createDefaultModel()); // now read some stuff ds.getNamedModel("urn:my:graph1").read("http://www.w3.org/2000/01/rdf-schema"); ds.getNamedModel("urn:my:graph2").read("http://purl.org/dc/elements/1.1/"); ds.getNamedModel("urn:my:graph3").read("http://xmlns.com/foaf/0.1/index.rdf"); // register the SPARQL2SQL query engine -- must be done once at // startup time RDBQueryEngineFactory.registerSelf(); // Set log level to debug // This causes the engine to log executed SELECT statements Logger.getLogger(RDBDataSource.class).setLevel(Level.DEBUG); // do a SPARQL query String sparql = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " + "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> " + "SELECT ?source ?uri ?superclass " + "WHERE { GRAPH ?source { " + "{ ?uri rdf:type rdfs:Class } UNION { ?uri rdf:type rdf:Property } " + "OPTIONAL { ?uri rdfs:subClassOf ?superclass } } }"; Query q = QueryFactory.create(sparql); ResultSet results = QueryExecutionFactory.create(q, ds).execSelect(); // print results using an ARQ utility class ResultSetFormatter.out(System.out, results, q); // close the dataset ds.close();
This is experimental software in a very early stage of development. No extensive testing has been performed.
WHERE { ?x :a :b OPTIONAL { ?x :c1 ?y } OPTIONAL { ?x :c2 ?y } }(The results depend on which ?y is bound “first”)
WHERE { GRAPH ?g {} }
WHERE { GRAPH ?g { OPTIONAL { ?s ?p ?o } } }
sparql2sql uses the Jena ModelRDB database schema.
This allows SPARQL queries over existing ModelRDB stores, but comes at a performance and complexity cost since the Jena DB schema was not designed with RDF Datasets in mind.
ModelRDB is able to store multiple models in a single statement table. This feature is used by sparql2sql to simulate RDF Datasets. The model ID is used to store graph name URIs. The URIs are encoded using ModelRDB's node encoding scheme to improve join performance.
Generated SQL statements can be logged by lowering the log level:
Logger.getLogger(RDBDataSource.class).setLevel(Level.DEBUG);