summary

This document describes Eyeball 1.3. Changes since 1.2, 1.1, and 1.0 are summarised in the release notes: note that the command-line syntax changed from 1.0 to 1.1, and that the experimental GUI is not described here.

Eyeball - checking RDF for common problems

Eyeball is a library and command-line tool for checking RDF models for certain common problems. These problems often result in technically correct but implausible RDF. Eyeball checks against user-provided schema files and makes various closed-world assumptions.

Eyeball currently can check for:

We plan to have Eyeball check for: Eyeball's checks are performed by Inspector plug-ins and can be customised by the user. Rendering its reports to output is performed by Renderer plug-ins which can also be customised by the user.

installation

Fetch the Eyeball distribution zipfile and unpack it somewhere convenient. Read the documention in the doc/index.html file (that's what this is). Eyeball 1.3 comes with its own copy of Jena (because it's based on the CVS version of Jena, 2.3CVS); don't get the two confused.

In the Eyeball distribution directory, run the Eyeball tests:

ant test
If these tests fail, something is wrong. Sometimes it's no more than a classpath problem, which you can fix. If not, use the jena-dev mailing list to request assistance. Note that any support is provided on a voluntary basis, as and when the effort is available.

If the tests have passed, you can copy lib/*.jar to whatever place you find convenient. You can then use it from the command line or from within Jena code. You will also need to copy the directories mirror and etc to somewhere convenient where the Jena FileManager class can see them.

command line operation

You must ensure that the Eyeball jars and the Jena libraries -- in that order -- are on your classpath. (Note that a CVS version of Eyeball may come with its own jena.jar and may not work with your usual installation.)

Run the command:

java [java options eg classpath and proxy] jena.eyeball 
    -assume Reference+  
    -check dataFileOrURL* OR -modelSpec specFileOrURL
    [-config fileOrURL*]
    [-render Name]
    [-inspectors shortName*]
    [-exclude shortName*]
The -whatever sections can come in any order and may be repeated, in which case the new arguments are appended to the existing ones.

The -config fileOrURL options specify the configuration files to load. A single configuration model is constructed as the union of the contents of those files, plus any eye:loadConfig files. If this option is omitted, the config file etc/eyeball-config.n3 is loaded. See loadConfigFiles for details.

The -assume Reference identifies any assumed schemas used to identify the predicates and classes of the data model. The reference may be a file name, a URL, or the short name of a collection of schemas supplied in the configuration file. Several -assume options may be given. Non-option arguments following -assume will b taken as additional schema.

Eyeball assumes the RDF and RDFS schemas, and the built-in XSD datatype classes, by default. The short name owl can be used to refer to the OWL schema, dc to the Dublin Core schema, dcterms to the Dublin Core terms schema,and dc-all to both.

The dataFileOrURLs name the files or URL references containing the data to be eyeballed. If several names are given, a combined module containing all their content is checked. Alternatively, a single modelspec specFileOrURL can be provided, in which case the loaded model is described by the specified Jena Model Spec, for which see the Jena documentation: this feature is under trial.

If any of the data or schema are identified by an http: URL, and you are behind a firewall, you will need specify the proxy to Java using system properties; one way to do this is by using the Java command line options:

    -DproxySet=true
    -DproxyHost=theProxyHostName
    -DproxyPort=theProxyPortNumber

The inspectors shortNames are strings which are the eye:shortName value of some inspector cluster in the Eyeball config file; see the config file description for details. If omitted, it is as if

    -inspectors defaultInspectors 
had been written. The -exclude option allows the shortnames of inspectors to be excluded from the checks. (eg, the type inspector currently slows things down quite a lot and might well be excluded from an initial eyeballing.)

The eyeball reports are written to the standard output; by default, the reports appear as text (RDF rendered by omitting the subjects - which are all blank nodes - and lightly prettifying the predicate and object). To change the rendering style, supply the -render option with the name of the renderer as its value. Eyeball comes with N3, XML, and text renderers; the Eyeball config file associates renderer names with their classes.

To tests models from databases, or with attached reasoners, see the Jena Model Spec documentation; be warned that this code is changing at this time.

examples of command-line use

(Assuming an implicit CLASSPATH)
java jena.eyeball -assume -check myDataFile.rdf

java jena.eyeball -assume dc -check http://example.com/nosuch.n3

java jena.eyeball -assume mySchema.rdf -check myData.rdf -render xml

java jena.eyeball -check myData.rdf -inspectors defaultInspectors

use as a library

creating an Eyeball

To create an Eyeball on a particular schema, do:

Eyeball eyeball = new Eyeball( modelWithSchemaInIt );

If, instead of a single schema, you have several schemas bundled together in a List L, you can supply that list as a SchemaList:

Eyeball eyeball = new Eyeball( new SchemaList( L ) );

It is also possible to build a SchemaList one model at a time, since SchemaList supports an add(Model) method.

All of these forms use the default list of inspectors. To supply a non-default list, use:

Eyeball eyeball = new Eyeball( inspectors, aSchemaList );

inspectors must be a list of full classnames of Inspector classes. To make constructing this list easier, you can use the method

List Eyeball::getInspectors( Model config, List inspect, List except )

inspect is a list of shortnames of inspectors to include, and except is a list of shortnames of inspectors to exclude, exactly as for the command-line options (for which this method is the implementation). config is the config model to use.

to eyeball a model

eyeball.inspect( modelToBeInspected )
The result is an instance of EyeballReport. The model() method delivers an RDF model which describes the problems found by the inspection. The inspections supplied in the distribution use the EYE vocabulary, and are used in the standard reports:
unknown predicate eye:unknownPredicate URI the URI of the unknown predicate
bad URI eye:badURI String the spelling of the bad URI
illegal language code eye:badLanguage String the bad language code
eye:onLiteral String a plain literal with the same lexical form
bad datatype URI eye:forReason URI the reason URI
eye:onLiteral String a plain literal with the same lexical form
bad namespace URI eye:onPrefix String the prefix name with the bad namespace
eye:forReason URI the reason URI
eye:badNamespaceURI String the spelling of the bad URI
Jena prefix found eye:jenaPrefixFound String the name of the Jena prefix
eye:forNamespace the namespace the prefix is bound to
implausible vocabulary item eye:onResource URI the URI of the implausible resource
eye:notFromSchema URI the URI of the schema
an undeclared class eye:unknownClass the resource that was presumed to be a Class
an untyped Resource eye:hasNoType Resource the resource that has no rdf:type property
inconsistent types for resource eye:noConsistentTypeFor URI the URI of the inconsistent resource
eye:hasAttachedType URI one of the given types that have no intersection
"wrong" number of property values for some subject eye:cardinalityFailure the subject for which the failure was detected
eye:onProperty the property P that has the wrong number of values
eye:onType the cardinality-constrained type
eye:cardinality a blank node [eye:min min; eye:max max]
eye:numValues the number of values of P found
eye:values a blank node of rdf:type eye:Set with an rdfs:member value for each of the values of P.

Every report item in the model is a blank node with rdf:type eye:Item.

The labels for the Eyeball predicates and reason messages are defined in the Eyeball schema file etc/eyeball-schema.n3 (and are used by the text renderer):
eye:uriContainsSpaces the URI contains unencoded spaces, probably as a result of sloppy use of file: URLs.
eye:uriFileInappropriate a URI used as a namespace is a file: URI, which is inappropriate as a global identifier.
eye:uriHasNoScheme a URI has no scheme field, probably a misused relative URI.
eye:schemeShouldBeLowercase the scheme part of a URI is not lower-case; while technically correct, this is not usual practice.
eye:uriFailsPattern a URI fails the pattern appropriate to its schema (as defined in the configuration for this eyeball).
eye:unrecognisedScheme the URI scheme is unknown, perhaps a misplaced QName.
eye:uriNoHttpAuthority an http: URI has no authority (domain name/port) component.
eye:uriSyntaxFailure the URI can't be parsed using the general URI syntax, even with any spaces removed.
eye:namespaceEndsWithNameCharacter a namespace URI ends in a character that can appear in a name, leading to possible ambiguities.
The prefix eye stands for the URL http://jena.hpl.hp.com/Eyeball#.

Eyeball configuration

configuration files

The Eyeball command-line utility is configured by files (or URLs) specified on the command line: their RDF contents are unioned together into a single config model. If no config file is specified, then etc/eyeball-config.n3 is loaded. Similarly, in use as a library, the Eyeball constructor can optionally take a single configuration Model as argument, and if no such model is supplied, defaults to etc/eyeball-config.n3.

Configuration files loaded from the command line may contain statements with the predicate eye:loadConfig; their objects should be strings (not URIs) naming other configuration files to be loaded. This allows a user to extend the default configuration without having to modify the default file.

Eyeball is also configured by the location-mapping file etc/location-mapping.n3. The Eyeball jar contains copies of both the default config and the location mapper; these are used by default. You can provide your own etc/eyeball-config.n3 file earlier on your classpath or in your current directory; this config replaces the default. You may provide additional location-mapping files earlier on your classpath or in your current directory.

configuring schema names

To avoid having to quote schema names in full on the Eyeball command line, (collections of) schemas can be given short names.
[] eye:shortName shortNameLiteral
    ; eye:schema fullSchemaURL
    ...
    .
A shortname can name several schemas. The Eyeball delivery has the short names rdf, rdfs, owl, and dc for the corresponding schemas (and mirror versions of those schemas so that they don't need to be downloaded each time Eyeball is run.)

configuring inspectors

The inspectors that Eyeball runs over the model are specfied by eye:inspector properties of inspector resources. These resources are identified by eye:shortNames (supplied on the command line). Each such property value must be a plain string literal whose value is the full name of the Inspector class to load and run; see the Javadoc of Inspector for details.

An inspector resource may refer to other inspector resources to include their inspectors, using either of the two properties eye:include or eye:includeByName. The value of an include property should be another inspector resource; the value of an includeByName property should be the shortName of an inspector resource.

[Two inspector resources may refer to each other, in which case they are equivalent.]

The inspectors provided in the Eyeball distribution are:
class leafname eye:shortName description
LiteralInspector literal Checks literals for syntactically correct language codes.
PredicateInspector predicate Checks that every predicate used is "declared" in some provided schema.
PrefixInspector prefix Checks that prefix namespaces are well-formed and that well-known prefixes have their well-known URIs.
JenaPrefixInspector jena-prefix Checks for namespace prefixes generated by Jena, ie, those of the form j.N+.
URIInspector URI Checks that every URI in the model is well-formed.
VocabularyInspector vocabulary Checks that every URI in the model whose namespace is mentioned in some schema is one of the URIs declared in the schema. [This one needs a bit of work ...]
AllTypedInspector all-typed checks that all URI and bnode resources in the model have an rdf:type property in the model or the schema(s). If there is a statement in the confiuration with property eye:checlLiteralTypes and value eye:true, also checks that every literal has a type or a language. Not in the default set of inspectors.
ConsistentTypeInspector consistent-type Checks that every subject in the model can be given a type which is the intersection of the subclasses of all its "attached" types. See below.
PresumedClassInspector presumed-class Checks that every resource in the model that appears as the object of an rdf:type, rdfs:domain, or rdfs:range statement, or as the subject or object of an rdfs:subClassOf statement, has been declared as a Class in the schemas or the modeul under test. Note: OWLification coming soon.
CardinalityInspector cardinality Looks for classes C that are subclasses of cardinality restrictions on some property P with cardinality range min to max. For any X of rdf:type C, it checks that the number of values of P is in the range min..max and geenrates a report if it isn't. (Doesn't account for owl:sameAs in the 1.2 release.)

a note on the ConsistentTypeInspector

The ConsistentTypeInspector warns about subjects for which it cannot find a consistent subtype. By this we mean that when we consider all the types that it has (explicitly stated or inferred), there is no type such is a subtype of all of them, assuming that the type hierarchy in the model is complete.

For example, if the model contains three types Top, Left, and Right, with Left and Right both being subtypes of Top and with no other subclass statements, then some S with rdf:types Left and Right would generate this warning.

The ConsistentTypeInspector must do at least some type inference. This release of Eyeball compromises by doing RDFS inference augmented by (very) limited union and intersection reasoning, as described in the Jena rules in etc/owl-like.rules. Even so, doing type inference over a large model is costly; you may wish to suppress it with -exclude until any other warnings are dealt with.

While, technically, a resource with no attached types at all is automatically inconsistent, Eyeball quietly ignores such resources, since they turn up quite often in simple RDF models.

Implementation note: The ConsistentTypeInspector's inferencing is done entirely by forward rules, triggered on the first subject to inspect. Once the rules have run to completion, further subjects are cheap. Using backward rules, the initial closure of the model was somewhat cheaper, but each new subject in a biggish took a long time - a second or so - to process.

configuring URI checks

Eyeball applies some general, configurable, URI checks as well as the built-in ones. The config file contains statements using the property eye:schemePattern; their objects must be strings which describe a legal (Java regex) pattern for a URI. The scheme parts of those patterns form the set of known URI schemes: a URI that has that scheme, but does not match any of the patterns for that scheme, generates an eye:uriFailsPattern report.
Eyeball forms a single |-separated Java regular expression from all the patterns sharing the same scheme part.
The currently shipped config file restricts the type-id part of a URN to containing letters, digits, and hyphens, and to start with a letter.

configuring renderers

The renderer class that Eyeball uses to render the report into text is giving in the config file by triples of the form:
[]
    eye:renderer FullClassName
    ; eye:shortName ShortClassHandler
The FullClassName is a string literal giving the full class name of the rendering class. That class must implement the Renderer interface and have a constructor that takes a single Model (the configuring model) as an argument.

The ShortClassHandle is a string literal giving the short name used to refer to the class. The default short name used is default. There should be no more than one eye:shortName statement with the same ShortClassHandle in the configuation file, but the same class can have many different short names.