summary

This document describes Eyeball 2.1. Changes since 2.0. 1.3, 1.2, 1.1, and 1.0 are summarised in the release notes: note that the command-line syntax changed from 1.0 to 1.1, and again from 1.3 to 2.0, and that the experimental GUI is not described here.

Eyeball - checking RDF for common problems

Eyeball is a library and command-line tool for checking RDF models for certain common problems. These problems often result in technically correct but implausible RDF. Eyeball checks against user-provided schema files and makes various closed-world assumptions.

Eyeball currently can check for:

We plan to have Eyeball check for: Eyeball's checks are performed by Inspector plug-ins and can be customised by the user. Rendering its reports to output is performed by Renderer plug-ins which can also be customised by the user.

installation

Fetch the Eyeball distribution zipfile and unpack it somewhere convenient. Read the documention in the doc/index.html file (that's what this is). Eyeball 2.1 comes with its own copy of Jena 2.4 with CVS updates.

In the Eyeball distribution directory, run the Eyeball tests:

ant test
If these tests fail, something is wrong. Sometimes it's no more than a classpath problem, which you can fix. If not, use the jena-dev mailing list to request assistance. Note that any support is provided on a voluntary basis, as and when the effort is available.

If the tests have passed, you can copy lib/*.jar to whatever place you find convenient. You can then use it from the command line or from within Jena code. You will also need to copy the directories mirror and etc to somewhere convenient where the Jena FileManager class can see them.

command line operation

You must ensure that the Eyeball jars and the Jena libraries -- in that order -- are on your classpath. (Note that a CVS version of Eyeball may come with its own jena.jar and may not work with your usual installation.)

Run the command:

java [java options eg classpath and proxy] jena.eyeball 
    [-assume Reference*]
    -check specialURL+
    [-config fileOrURL*]
    [-root rootURL]
    [-render Name]
    [-include shortName*]
    [-exclude shortName*]
The -whatever sections can come in any order and may be repeated, in which case the new arguments are appended to the existing ones.

The -config fileOrURL options specify the Eyeball assembler file to load. A single configuration model is constructed as the union of the contents of those files. If this option is omitted, the config file etc/eyeball-config.n3 is loaded. See loadConfigFiles for details.

The -assume References identifies any assumed schemas used to identify the predicates and classes of the data model. The reference may be a file name or a URL; it is loaded by a default FileManager (and hence respects any FileManager renamings).

Eyeball automatically assumes the RDF and RDFS schemas, and the built-in XSD datatype classes. The short name owl can be used to refer to the OWL schema, dc to the Dublin Core schema, dcterms to the Dublin Core terms schema,and dc-all to both.

The specialURLs name the files or URL references containing the data to be eyeballed. If several names are given, each is checked individually.

If the URL is of the form ont:NAME:base, then the checked model is the model base treated as an OntModel with the specification OntModelSpec.NAME. If the URL (or the base) is of the form jdbc:DB:head:model, then the checked model is the one called model in the database with connection jdbc:DB:head. (The database user and password must be specified independently using the jend.db.user and jena.db.password system properties.)

If any of the data or schema are identified by an http: URL, and you are behind a firewall, you will need specify the proxy to Java using system properties; one way to do this is by using the Java command line options:

    -DproxySet=true
    -DproxyHost=theProxyHostName
    -DproxyPort=theProxyPortNumber

The include shortNames are strings which are the eye:shortName value of some inspector cluster in the Eyeball config file; see the config file description for details. If omitted, it is as if

    -include defaultInspectors 
had been written. The -exclude option allows the shortnames of inspectors to be excluded from the checks. (eg, the type inspector currently slows things down quite a lot and might well be excluded from an initial eyeballing.)

The eyeball reports are written to the standard output; by default, the reports appear as text (RDF rendered by omitting the subjects - which are all blank nodes - and lightly prettifying the predicate and object). To change the rendering style, supply the -render option with the name of the renderer as its value. Eyeball comes with N3, XML, and text renderers; the Eyeball config file associates renderer names with their classes.

examples of command-line use

(Assuming an implicit CLASSPATH)
java jena.eyeball -check myDataFile.rdf

java jena.eyeball -assume dc -check http://example.com/nosuch.n3

java jena.eyeball -assume mySchema.rdf -check myData.rdf -render xml

java jena.eyeball -check myData.rdf -include defaultInspectors

use as a library

Eyeball can be used from within Java code; the command line merely provides a convenient external interface.

creating an Eyeball

An Eyeball object has three subcomponents: the assumptions against which the model is to be checked, the inspectors which do the checking, and the renderer used to display the reports.

The assumptions are bundled into a single OntModel. Multiple assumptions can be supplied either by adding them as sub-models or by loading their content directly into the OntModel.

The inspectors are supplied as a single Inspector object. The method Inspector.Operations.create(List) creates a single Inspector from a list of Inspectors; this inspector delegates all its inspection methods to all of its sub-inspectors.

The renderer can be anything that implements the (simple) renderer interface.

To create an Eyeball:

Eyeball eyeball = new Eyeball( inspector, assumptions, renderer );

to eyeball a model

Models to be inspected are provided as OntModels. The problems are delivered to a Report object, where they are represented as an RDF model.

eyeball.inspect( report, ontModelToBeInspected )
The result is that same report object. The Report::model() method delivers an RDF model which describes the problems found by the inspection. The inspections supplied in the distribution use the EYE vocabulary, and are used in the standard reports:
unknown predicate eye:unknownPredicate URI the URI of the unknown predicate
bad URI eye:badURI String the spelling of the bad URI
illegal language code eye:badLanguage String the bad language code
eye:onLiteral String a plain literal with the same lexical form
bad datatype URI eye:forReason URI the reason URI
eye:onLiteral String a plain literal with the same lexical form
bad namespace URI eye:onPrefix String the prefix name with the bad namespace
eye:forReason URI the reason URI
eye:badNamespaceURI String the spelling of the bad URI
Jena prefix found eye:jenaPrefixFound String the name of the Jena prefix
eye:forNamespace the namespace the prefix is bound to
implausible vocabulary item eye:onResource URI the URI of the implausible resource
eye:notFromSchema URI the URI of the schema
an undeclared class eye:unknownClass the resource that was presumed to be a Class
an untyped Resource eye:hasNoType Resource the resource that has no rdf:type property
a repeated prefix namespace eye:multiplePrefixesForNamespace the namespace resource that has multiple prefixes
eye:onPrefix the prefixes that were bound to the namespace
inconsistent types for resource eye:noConsistentTypeFor URI the URI of the inconsistent resource
eye:hasAttachedType URI one of the given types that have no intersection
"wrong" number of property values for some subject eye:cardinalityFailure the subject for which the failure was detected
eye:onProperty the property P that has the wrong number of values
eye:onType the cardinality-constrained type
eye:cardinality a blank node [eye:min min; eye:max max]
eye:numValues the number of values of P found
eye:values a blank node of rdf:type eye:Set with an rdfs:member value for each of the values of P.
ill-formed list eye:illFormedList the URI of the root of the list
eye:because [eye:element index; has no/has multiple rdf:first/rest properties]
a likely miswritten typed list idiom has been detected eye:suspectListIdiom the list type that is suspect
suspicious restriction, ie doesn't have exactly one owl:onProperty statement and exactly one constraint. eye:suspiciousRestriction a blank node with the restriction properties
[optional, multiple] eye:forReason URI eye:missingOnProperty -- there is no owl:onProperty property in this suspicious restriction.
eye:multipleOnProperty -- there is more than one owl:onProperty in this suspicious restriction.
eye:missingConstraint -- there is no owl:hasValue, owl:allValuesFrom, owl:someValuesFrom, or owl:[minC|maxC|c]ardinality property in this suspicious restriction.
eye:multipleConstraint -- there are multiple constraints (as above) in this suspicious restriction.
[optional, multiple] eye:subClassOf an immediate named superclass of this suspicious restriction, to help identify it.
[optional, multiple] eye:equivalentClass an immediate named owl:equivalentClass of this suspicious restriction, to help identify it.
A SPARQL query that was required to succeed did not, or a SPARQL query that was required to fail did not. eye:sparqlRequireFailed query the query that failed, or a designated alternative message.
eye:sparqlProhibitFailed query the query that should not have succeeded, or a designated alternative message.

Every report item in the model is a blank node with rdf:type eye:Item.

The labels for the Eyeball predicates and reason messages are defined in the Eyeball schema file etc/eyeball-schema.n3 (and are used by the text renderer):
eye:uriContainsSpaces the URI contains unencoded spaces, probably as a result of sloppy use of file: URLs.
eye:uriFileInappropriate a URI used as a namespace is a file: URI, which is inappropriate as a global identifier.
eye:uriHasNoScheme a URI has no scheme field, probably a misused relative URI.
eye:schemeShouldBeLowercase the scheme part of a URI is not lower-case; while technically correct, this is not usual practice.
eye:uriFailsPattern a URI fails the pattern appropriate to its schema (as defined in the configuration for this eyeball).
eye:unrecognisedScheme the URI scheme is unknown, perhaps a misplaced QName.
eye:uriNoHttpAuthority an http: URI has no authority (domain name/port) component.
eye:uriSyntaxFailure the URI can't be parsed using the general URI syntax, even with any spaces removed.
eye:namespaceEndsWithNameCharacter a namespace URI ends in a character that can appear in a name, leading to possible ambiguities.
eye:uriHasNoLocalname a URI has no local name according to the XML name-splitting rules. (For example, the URI http://x.com/foo#12345 has no local name because a local name cannot start with a digit.)
The prefix eye stands for the URL http://jena.hpl.hp.com/Eyeball#.

Eyeball configuration

configuration files

The Eyeball command-line utility is configured by files (or URLs) specified on the command line: their RDF contents are unioned together into a single config model. If no config file is specified, then etc/eyeball-config.n3 is loaded.

The configuration file is a Jena assembler description with added Eyeball vocabulary.

Eyeball is also configured by the location-mapping file etc/location-mapping.n3. The Eyeball jar contains copies of both the default config and the location mapper; these are used by default. You can provide your own etc/eyeball-config.n3 file earlier on your classpath or in your current directory; this config replaces the default. You may provide additional location-mapping files earlier on your classpath or in your current directory.

configuring schema names

To avoid having to quote schema names in full on the Eyeball command line, (collections of) schemas can be given short names.
[] eye:shortName shortNameLiteral
    ; eye:schema fullSchemaURL
    ...
    .
A shortname can name several schemas. The Eyeball delivery has the short names rdf, rdfs, owl, and dc for the corresponding schemas (and mirror versions of those schemas so that they don't need to be downloaded each time Eyeball is run.)

configuring inspectors

The inspectors that Eyeball runs over the model are specfied by eye:inspector properties of inspector resources. These resources are identified by eye:shortNames (supplied on the command line). Each such property value must be a plain string literal whose value is the full name of the Inspector class to load and run; see the Javadoc of Inspector for details.

An inspector resource may refer to other inspector resources to include their inspectors, using either of the two properties eye:include or eye:includeByName. The value of an include property should be another inspector resource; the value of an includeByName property should be the shortName of an inspector resource.

[Two inspector resources may refer to each other, in which case they are equivalent.]

The inspectors provided in the Eyeball distribution are:
class leafname eye:shortName description
LiteralInspector literal Checks literals for syntactically correct language codes, syntactically correct datatype URIs, and conformance of the lexical form of typed literals to their datatype
PropertyInspector property Checks that every property used is "declared" in some provided schema.
PrefixInspector prefix Checks that prefix namespaces are well-formed and that well-known prefixes have their well-known URIs. Also reports Jena automatically generated (j.Number) prefixes.
URIInspector URI Checks that every URI in the model is well-formed. Uses the new Jena IRI code.
VocabularyInspector vocabulary Checks that every URI in the model whose namespace is mentioned in some schema is one of the URIs declared for that namespace. If the inspector has any eye:openNamespace properties, their values are resources whose URIs are "open" namespaces that the inspector will not report.
AllTypedInspector all-typed checks that all URI and bnode resources in the model have an rdf:type property in the model or the schema(s). If there is a statement in the confiuration with property eye:checlLiteralTypes and value eye:true, also checks that every literal has a type or a language. Not in the default set of inspectors.
ConsistentTypeInspector consistent-type Checks that every subject in the model can be given a type which is the intersection of the subclasses of all its "attached" types. See below.
ClassInspector presumed-class Checks that every resource in the model that appears as the object of an rdf:type, rdfs:domain, or rdfs:range statement, or as the subject or object of an rdfs:subClassOf statement, has been declared as a Class in the schemas or the model under test.
CardinalityInspector cardinality Looks for classes C that are subclasses of cardinality restrictions on some property P with cardinality range min to max. For any X of rdf:type C, it checks that the number of values of P is in the range min..max and generates a report if it isn't. (Doesn't account for owl:sameAs in the 1.2 release.)
ListInspector list
  • looks for lists that are ill-formed by having multiple or missing rdf:first or rdf:rest properties.
  • looks for possible mis-uses of the "type list" idiom, and reports the types so defined: see below.
OwlSyntaxInspector owl Looks for "suspicious restrictions" which have some of the OWL restriction properties but not exactly one owl:onProperty and exactly one constraint (owl:allValuesFrom, etc).
SparqlDrivenInspector sparql checks that given SPARQL queries succeed (if required) or fail (if prohibited) when applied to the model.

a note on the ListInspector

The typed list idiom is boilerplate OWL for defining a type which is List-of-T for some type T, and looks like:
my:EList a owl:Class
    ; rdfs:subClassOf rdf:List
    ; rdfs:subClassOf [owl:onProperty rdf:first; owl:allValuesFrom my:Element]
    ; rdfs:subClassOf [owl:onProperty rdf:rest; owl:allValuesFrom my:EList]
    .
The type my:Element is the element type of the list, and the type EList is the resulting typed list. The list inspector checks that all the subclasses of rdf:List that are also subclasses of any bnode that has any property that has as an object either rdf:first or rdf:rest is a subclass defined by the full idiom above: if not, it reports it as a suspectListIdiom.

a note on the ConsistentTypeInspector

The ConsistentTypeInspector warns about subjects for which it cannot find a consistent subtype. By this we mean that when we consider all the types that it has (explicitly stated or inferred), there is no type such is a subtype of all of them, assuming that the type hierarchy in the model is complete.

For example, if the model contains three types Top, Left, and Right, with Left and Right both being subtypes of Top and with no other subclass statements, then some S with rdf:types Left and Right would generate this warning.

The ConsistentTypeInspector must do at least some type inference. This release of Eyeball compromises by doing RDFS inference augmented by (very) limited union and intersection reasoning, as described in the Jena rules in etc/owl-like.rules. Even so, doing type inference over a large model is costly; you may wish to suppress it with -exclude until any other warnings are dealt with.

While, technically, a resource with no attached types at all is automatically inconsistent, Eyeball quietly ignores such resources, since they turn up quite often in simple RDF models.

Implementation note: The ConsistentTypeInspector's inferencing is done entirely by forward rules, triggered on the first subject to inspect. Once the rules have run to completion, further subjects are cheap. Using backward rules, the initial closure of the model was somewhat cheaper, but each new subject in a biggish took a long time - a second or so - to process.

configuring URI checks

Eyeball applies some general, configurable, URI checks as well as the built-in ones. The config file contains statements using the property eye:schemePattern; their objects must be strings which describe a legal (Java regex) pattern for a URI. The scheme parts of those patterns form the set of known URI schemes: a URI that has that scheme, but does not match any of the patterns for that scheme, generates an eye:uriFailsPattern report.
Eyeball forms a single |-separated Java regular expression from all the patterns sharing the same scheme part.
The currently shipped config file restricts the type-id part of a URN to containing letters, digits, and hyphens, and to start with a letter.

configuring renderers

The renderer class that Eyeball uses to render the report into text is giving in the config file by triples of the form:
[]
    eye:renderer FullClassName
    ; eye:shortName ShortClassHandler
The FullClassName is a string literal giving the full class name of the rendering class. That class must implement the Renderer interface and have a constructor that takes a single Model (the configuring model) as an argument.

The ShortClassHandle is a string literal giving the short name used to refer to the class. The default short name used is default. There should be no more than one eye:shortName statement with the same ShortClassHandle in the configuation file, but the same class can have many different short names.

The TextRenderer supports an additional property eye:labels to allow the appropriate labels for an ontology to be supplied to the renderer. Each object of a eye:labels statement names a model; all the rdfs:label statements in that model are used to supply strings which are used to render resources.

The model names are strings which are interpreted by Jena's FileManager, so they may be redirected using Jena's file mappings.