Title: A brief guide to Jena Eyeball

This document is a work-in-progress; refer to the
[manual](eyeball-manual.html) for details when this page doesn't help.

So you've got Eyeball installed and you've run it on one of your
files, and Eyeball doesn't like it. You're not sure why, or what to
do about it. Here's what's going on.

Eyeball inspects your model against a set of *schemas*. The default
set of schemas includes RDF, RDFS, the XSD datatypes, and any
models your model imports: you can add additional schemas from the
command line or configuration file. Eyeball uses those schemas to
work out what URIs count as "declared" in advance. It also checks
URIs and literals for syntactic correctness and name space prefixes
for being "sensible". Let's look at some of the messages you can
get.

## Unknown predicate reports

You'll probably find several messages like this:
    predicate not declared in any schema: somePredicateURI

Eyeball treats the imported models, and (independently) the
specified schemas, as single OntModels, and extracts those
OntModels' properties. It includes the RDF and RDFS schemas.
Anything used as a predicate that isn't one of those properties is
reported.

If you're using OWL, you can silence the "undeclared property"
messages about OWL properties by adding to your Eyeball command
line the option:
    -assume owl

Eyeball will read the OWL schema (it has a copy stashed away in the
*mirror* directory) and add the declared properties to its known
list. This works for any filename or URL you like, so long as
there's RDF there and it has a suitable file suffix - *.n3* for N3
or *.rdf* or *.owl* for RDF/XML - and for the built-in names *dc*
(basic Dublin Core), *dcterms* (Dublin Core terms) and *dc-all*
(both). So you can construct your own schemas, which declare your
own domain-specific property declarations, and invoke Eyeball with

    -assume owl *mySchemaFile.n3* *otherSchemaFile.rdf*

You can give short names (like **dc** and **rdfs**) to your own
schemas, or collections of schemas, using an Eyeball *config file*,
but you'll have to see the [manual](eyeball-manual.html) to find out how.

## Unknown class reports

You may see messages like this:

    class not declared in any schema: someClassURI

Having read the previous section, you can probably work out what's
going on: Eyeball reads the schemas (and imports) and extracts the
declared OntClasses. Then anything used as a class that isn't one
of those declared classes is reported..

And that's exactly it. "Used as a class" means appearing as **C**
or **D** in any statement of the form:

    \_ rdf:type C
    \_ rdfs:domain C
    \_ rdfs:range C
    C rdfs:subClassOf D

## Suppressing inspectors

It may be that you're not interested in the "unknown predicate" or
"unknown class" reports until you've sorted out the URIs. Or maybe
you don't care about them. In that case, you can switch them off.

Eyeball's different checks are carried out by *inspector* classes.
These classes are given short names by entries in Eyeball config
files (which are RDF files written using N3; you can see the
default config file by looking in Eyeball's `etc` directory for
`eyeball2-config.n3`). By adding eg:

    -exclude property class

to the Eyeball command line, you can *exclude* the inspectors with
those short names from the check. *property* is the short name for
the "unknown property" inspector, and *class* is the short name for
the "unknown class" inspector.

## Namespace and URI reports

Eyeball checks all the URIs in the model, including (if available)
those used for namespaces. (And literals, but see below.) Here's an
example:

    bad namespace URI: "file:some-filename"
        on prefix: "pqr"
        for reason: file URI inappropriate for namespace

A "bad namespace URI" means that Eyeball doesn't like the URI for a
namespace in the model. The "on prefix" part of the report says
what the namespace prefix is, and the "for reason" part gives the
reason. In this case, we (the designer of Eyeball) feel that it is
unwise to use file URIs - which tend to depend on internal details
of your directory structure - for global concepts.
A more usual reason is that the URI is syntactically illegal. Here
are some possibilities:

problem | explanation
------- | -----------
URI contains spaces | literal spaces are not legal in URIs. This usually arises from file URIs when the file has a space in its name. Spaces in URIs have to be encoded.
URI has no scheme | The URI has no scheme at all. This usually happens when some relative URI hasn't been resolved properly, eg there's no xml base in an RDF/XML document.
URI has an unrecognised scheme | The scheme part of the URI - the bit before the first colon - isn't recognised. Eyeball knows, by default, four schemes: **http**, **ftp**, **file**, and **urn**. This usually arises when a QName has "escaped" from somewhere, or from a typo. You can tell Eyeball about other schemes, if you need them.
scheme should be lower-case | The scheme part of the URI contains uppercase letters. While this is not actually *wrong*, it is unconventional and pointless.
URI doesn't fit pattern | Eyeball has some (weak) checks on the syntax of URIs in different schemes, expressed as patterns in its config files. If a URI doesn't match the pattern, Eyeball reports this problem. At the moment, you'll only get this report for a **urn** URI like *urn:x-hp:23487682347* where the URN id (the bit between the first and second colons, here *x-hp*) is illegal.
URI syntax error | A catch-all error: Java couldn't make any sense of this URI at all.

## Problems with literals

Eyeball checks literals (using the *literal inspector*, whose short
name is **literal** if you want to switch it off), but the checking
is quite weak because it doesn't understand types at the moment.
You can get two different classes of error.

    bad language: someLanguageCode
    on literal: theLiteralInQuestion

Literals with language codes (things like **en-UK** or **de**) are
checked to make sure that the language code conforms to the general
syntax for language codes: alphanumeric words separated by hyphens,
with the first containing no digits.

(Later versions of Eyeball will likely allow you to specify *which*
language codes you want to permit in your models. But we haven't
got there yet.)

    bad datatype URI: someURI
    on literal: theLiteralInQuestion
    for reason: theReason

Similarly, literals with datatypes are checked to make sure that
the type URI is legal. That's it for the moment: Eyeball doesn't
try to find out if the URI really is a type URI, or if the spelling
of the literal is OK for that type. But it spots the bad URIs. (The
messages are the same as those that appear in the URI checking -
above - for the very good reason that it's the same code doing the
checking.)

## Problematic prefixes

Both RDF/XML and N3 allow (and RDF/XML requires) namespaces to be
abbreviated by prefixes. Eyeball checks prefixes for two possible
problems. The first:

    non-standard namespace for prefix

This arises when a "standard" prefix has been bound to a namespace
URI which isn't its usual one. The "standard" prefixes are taken
from Jena's `PrefixMapping.Extended` and are currently:

    **rdf, rdfs, daml, owl, xsd, rss, vcard**

And the second:

    Jena generated prefix found

This arises when the model contains prefixes of the form `j.N`,
where N is a number. These are generated by Jena when writing
RDF/XML for URIs that must have a prefix (because they are used as
types or predicates) but haven't been given one.

If you're not bothered about inventing short prefixes for your
namespaces, you can **-exclude** `jena-prefix` to suppress this
inspection.

## But how do I ...

The reports described so far are part of Eyeball's default set of
inspections. There are some other checks that it can do that are
switched off by default, because they are expensive, initially
overwhelming, or downright obscure. If you need to add these checks
to your eyeballing, this is how to do it.

### ... make sure everything is typed?

Some applications (or a general notion of cleanliness) require that
every individual in an RDF model has an explicit `rdf:type`. The
Eyeball check for this isn't enabled by default, because lots of
casual RDF use doesn't need it, and more sophisticated use has
models with enough inference power to infer types.

You can add the **all-typed** inspector to the inspectors that
Eyeball will run by adding to the command line:

    -inspectors defaultInspectors all-typed

The **all-typed** inspector will generate a message

    resource has no rdf:type

for each resource in the model which is not the subject of an
`rdf:type` statement.

### ... check for type consistency?

One easy mistake to make in RDF is to make an assertion - we'll
call it **S P O** - about some subject **S** which is "of the wrong
type", that is, not of whatever type **P**'s domain is. This isn't,
in principle, an error, since RDF resources can have multiple
types, and this just makes **S** have a type which is a subtype of
both **P**'s domain and whatever type it was supposed to have.

To spot this, and related problems, Eyeball has the
**consistent-type** inspector. You can add it to the inspections in
the same way as the **all-typed** inspector:

    -inspectors defaultInspectors consistent-type

It checks that every resource which has been given at least one
type has a type which is a subtype of all its types, under an
additional assumption:

    Types in the type graph (the network of rdfs:subClassOf statements)
    are disjoint (share no instances) unless the type graph says
    they're not.

For example, suppose that both **A** and **B** are subclasses of
**Top**, and that there are no other subclass relationships. Then
**consistent-types** assumes that there are (supposed to be) no
resources which have both **A** and **B** as types. If it finds a
resource **X** which *does* have both types, it generates a message
like this:

    no consistent type for: X
    has associated type: A
    has associated type: B
    has associated type: Top

It's up to you to disentangle the types and work out what went
wrong.

*Note*: this test requires that Eyeball do a significant amount of
inference, to complete the type hierarchy and check the domains and
ranges of properties. It's quite slow, which is one reason it isn't
switched on by default.

### ... check the right number of values for a property?

You want to make sure that your data has the right properties for
things of a certain type: say, that a book has at least one author
(or editor), an album has at least one track, nobody in your
organisation has more than ten managers, a Jena contrib has at
least a `dc:creator`, a `dc:name`, and a `dc:description`. You
write some OWL *cardinality constraints*:

    my:Type rdfs:subClassOf [owl:onProperty my:track;
    owl:minCardinality 1]

Then you discover that, for wildly technical reasons, the OWL
validation code in Jena doesn't think it's an error for some album
to have no tracks (maybe there's a namespace error).
You can enable Eyeball's *cardinality inspector* by adding

    -inspectors cardinality

to the command line. You'll now get a report item for every
resource that has `rdf:type` your restricted type (`my:Type` above)
but doesn't have the right (at least one) value for the property.
It will look something like:

    cardinality failure for: my:Instance
        on type: my:Type
        on property: my:track
        cardinality range: [min: 1]
        number of values: 0
        values: {}

If there are some values for the property - say you've supplied an
`owl:maxCardinality` restriction and then gone over the top - they
get listed inside the `values` curly braces.