SquirrelRDF

Contents

  1. General Information
  2. Introduction
  3. Installation
  4. General Use
  5. Relational Database Mapper
  6. RDB Map Configuration
  7. RDB Map Issues
  8. LDAP Mapper
  9. LDAP Map Configuration
  10. LDAP Map Issues

General Information

SquirrelRDF is available in the following locations:

Help is available from jena-dev, IRC (freenode, #jena), or direct email.

It is distributed under the Jena licence

Introduction

There is a lot of structured information out there, but it just isn't in RDF. It isn't always possible, or desirable, to dump this data and convert it to RDF. It may not be possible to access the raw data, and, regardless, keeping this RDF version up to date would not be trivial.

SquirrelRDF is a tool which allows non-RDF data stores (or, perhaps, not explicitly RDF) to be queried using SPARQL. In its current form this includes relational databases (via JDBC) and LDAP servers (via JNDI). It provides an ARQ QueryEngine (for java access), a command line tool, and a servlet for SPARQL http access. As a result the information now looks like RDF, and is always current.

A note on model mapping

SquirrelRDF exposes the mapped store in a rather 'raw' form. It makes no attempt, for example, to reveal implicit relations between objects (suggested by foreign keys), or normalise denormalised data. This simplifies Squirrel's task, focusing it on mapping to RDF and ignoring the complex task of transforming between vocabularies or ontologies, which are better left to pure RDF tools. Here are some approaches:

Installation

You will need:

Put the jar files in lib/ if you want to build or test SquirrelRDF. Otherwise just ensure that they, together with lib/squirrelrdf.jar, are on your CLASSPATH.

General Use

In this section I will assume you are armed with a configuration file config.ttl. See below for details on configuration.

Command Line

The command line tool provides an easy way to check that all is working correctly:

lewis:~/ pldms$ java squirrelrdf.Query config.ttl \
          "SELECT * WHERE { ?s <http://example.com/people_name> ?name }"
WARN [main] (QueryEngine.java:106) - Default model is null in the dataset
-----------------------------------------------
| s                                | name     |
===============================================
| <http://example.com/people;id=1> | "Damian" |
| <http://example.com/people;id=2> | "Libby"  |
| <http://example.com/people;id=3> | "Dan"    |
| <http://example.com/people;id=4> | "Danny"  |
-----------------------------------------------

The second argument can also be a file containing the query. If no argument is given the query is taken from STDIN.

API

SquirrelRDF implements ARQ's QueryEngine. From this one can execute ASK, SELECT and CONSTRUCT queries:

Model config = FileManager.get().loadModel(configFile);
Query query = QueryFactory.create(theQuery);
QueryEngine qe = new SQLQueryEngine(query, config); 
                // or new LdapQueryEngine(query, config); for LDAP
qe.setDataset(DatasetFactory.create()); // empty data set
ResultSet results = qe.execSelect();

HTTP Protocol

SquirrelRDF includes a servlet (squirrelrdf.Servlet) and an example web app to get you started. Copy the libraries to webapp/WEB-INF/lib, and your configuration to webapp/WEB-INF/map.ttl. Deploy this web application, for example as 'squirrel', and you should be able to execute a query by visiting http://localhost:8080/squirrel/, or from the command line:

lewis:~/ pldms$ curl http://localhost:8080/squirrel/model \
      -d 'query=SELECT * WHERE { ?s <http://example.com/people_name> ?name }'

The servlet was written for a simple demonstration, and as a result is pretty limited. It can execute SELECT queries, and return results in XML. You can also give a stylesheet parameter, which will add a processing instruction to resulting XML. ASK, CONSTRUCT, JSON results, et al should be easy to add since ARQ supports them all.

RDB Mapper

The relational database map follows roughly what is described in [1]. It performs no model mapping, unlike [2].

Configuration

The database mapping can be automatically configured using the squirrel.ExtractConfig tool. Take your database details, and a namespace, and pass them to the tool. The result is a configuration in turtle:

lewis:~/ pldms$ java squirrelrdf.ExtractConfig \
                     jdbc:mysql://localhost/conference \
                     com.mysql.jdbc.Driver \
                     user password http://example.com/db/ > dbmap.ttl

You can also use squirrelrdf.ExtractConfig --list-tables to show the available tables, and append them to the command above, if you don't want to map every table (useful in SQL Server, if memory serves). Invoke with no arguments for usage details.

Here's a simple example:

@prefix db:      <http://jena.hpl.hp.com/schemas/rdbmap#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ex:      <http://example.com/db/> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix :        <#> .
@prefix owl:     <http://www.w3.org/2002/07/owl> .

ex:map
	a       db:Map ;
	db:mapsClass ex:people .

Here is a map, and it maps just one class, ex:people.

<jdbc:mysql://localhost/conference>
    a       db:Database ;
    db:pass "username" ;
    db:user "password" ;
    db:driver "com.mysql.jdbc.Driver" .

This is a database, with all the details necessary to talk to it

ex:people
	a       rdfs:Class ;
	db:primaryKey ex:people_id ;
	db:database <jdbc:mysql://localhost/conference> ;
	db:table "people" .

This class, which is mapped, corresponds to the table "people", in the given database. It has a primary key:

ex:people_id
    a       rdf:Property ;
    rdfs:domain ex:people ;
    db:col  "id" ;
    db:colType "int" .

This is a property of ex:people. It maps to the column "id". The column type given is not used currently, but we can see it's an integer.

ex:people_name
    a       rdf:Property ;
    rdfs:domain ex:people ;
    db:col  "name" ;
    db:colType "varchar" .

And another property of ex:people, called ex:people_name. The class and property URIs aren't significant, incidentally, but what ExtractConfig generates.

The Result

This mapping makes this table into the rdf:

People
idname
1Damian
2Libby
ex:people;id=1 a ex:people ;
	ex:people_id 1 ;
	ex:people_name "Damian" .
ex:people;id=2 a ex:people ;
	ex:people_id 2 ;
	ex:people_name "Libby" .

In summary:

Limitations and Issues

You can't query for properties

No { ?s ?p ?o }, I'm afraid, or even { :foo ?p :bar }. Sorry.

Type queries

You can't query for type ({ ?s a ?type }) at the moment. Because the relational type system is stronger than RDF's giving a type is often redundant, and doesn't change the SQL query. However { ?s a ex:type } is the notably exception to this.

You can query more than one database

You may have noticed that databases are associated with classes, not maps. So a map can involve more that one database, which you may find useful.

Tables without primary keys can act oddly

If a table has no primary key squirrelrdf can't identify rows, and returns blank nodes as subjects. This can result in oddities with optionals and unions.

LDAP Mapper

The LDAP Mapper is less complex, and less mature, than the RDB mapper. On the other hand LDAP is quite close to RDF, and so has fewer issues.

Configuration

No automatic configuration here, alas, but it isn't too hard. The map just maps properties to attributes, although there is some additional work depending on the range of the attribute.

@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix lmap: <http://jena.hpl.hp.com/schemas/ldapmap#> .
@prefix ex: <http://example.com/schemas/hpcorp#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .

<> a lmap:Map ;
	lmap:server <ldap://ldap.example.com/o=example.com> ;

An ldap map, mapping this server (starting search at the base o=example.com).

	lmap:mapsProp [ lmap:property foaf:name ; lmap:attribute "cn" ; ] ;

Map name to the cn attribute.

	lmap:mapsProp [ lmap:property foaf:homepage ; lmap:attribute "webpage" ; a lmap:URIProperty ; ] ;
	lmap:mapsProp [ lmap:property foaf:mbox ; lmap:attribute "uid" ; a lmap:EmailProperty ; ] ;

The values of webpage and uid are both URIs. In the case of the latter, however, mailto: will be prepended to the value.

	lmap:mapsProp [ lmap:property foaf:based_near ; lmap:attribute "workLocation" ; a lmap:ObjectProperty ; ] ;
	lmap:mapsProp [ lmap:property geo:lat ; lmap:attribute "latitude" ; ] ;
	lmap:mapsProp [ lmap:property geo:long ; lmap:attribute "longitude" ; ] ;
	.

workLocation points to another ldap node, which holds the work location.

The result is that the following query now works (skipping prefixes):

SELECT ?lat ?long
WHERE
{
	?person foaf:name "Damian Steer" ;
		foaf:based_near [ geo:lat ?lat ; geo:long ?long . ] .
}

Limitations and Issues

No property queries (again)

As with the RDB mapper, no { ?s ?p ?o } and friends. Sorry.

No multiple value support (yet)

Some attributes have multiple values, such as class in one case, where the class value was the closure over subclasses. This needs fixing.

No type support (yet)

Because multiple values don't work I couldn't do type support. If people want it, it will happen. You can see the beginnings in the schema.

References

[1] Relational Databases on the Semantic Web

[2] D2RQ

Valid XHTML 1.0 Strict