Eyeball currently can check for:
In the Eyeball distribution directory, run the Eyeball tests:
ant testIf these tests fail, something is wrong. Sometimes it's no more than a classpath problem, which you can fix. If not, use the jena-dev mailing list to request assistance. Note that any support is provided on a voluntary basis, as and when the effort is available.
If the tests have passed, you can copy lib/*.jar to whatever place you find convenient. You can then use it from the command line or from within Jena code. You will also need to copy the directories mirror and etc to somewhere convenient where the Jena FileManager class can see them.
jena.jar
and may not work
with your usual installation.)
Run the command:
The -whatever sections can come in any order and may be repeated, in which case the new arguments are appended to the existing ones.java [java options eg classpath and proxy] jena.eyeball -assume Reference+ -check dataFileOrURL* OR -modelSpec specFileOrURL [-config fileOrURL*] [-render Name] [-inspectors shortName*] [-exclude shortName*]
The -config fileOrURL options specify the configuration files to load. A single configuration model is constructed as the union of the contents of those files, plus any eye:loadConfig files. If this option is omitted, the config file etc/eyeball-config.n3 is loaded. See loadConfigFiles for details.
The -assume Reference identifies any assumed schemas used to identify the predicates and classes of the data model. The reference may be a file name, a URL, or the short name of a collection of schemas supplied in the configuration file. Several -assume options may be given. Non-option arguments following -assume will b taken as additional schema.
Eyeball assumes the RDF and RDFS schemas, and the built-in XSD datatype classes, by default. The short name owl can be used to refer to the OWL schema, dc to the Dublin Core schema, dcterms to the Dublin Core terms schema,and dc-all to both.
The dataFileOrURLs name the files or URL references containing the data to be eyeballed. If several names are given, a combined module containing all their content is checked. Alternatively, a single modelspec specFileOrURL can be provided, in which case the loaded model is described by the specified Jena Model Spec, for which see the Jena documentation: this feature is under trial.
If any of the data or schema are identified by an http: URL, and you are behind a firewall, you will need specify the proxy to Java using system properties; one way to do this is by using the Java command line options:
-DproxySet=true -DproxyHost=theProxyHostName -DproxyPort=theProxyPortNumber
The inspectors shortNames are strings which are the
eye:shortName
value of some inspector cluster in
the Eyeball config file; see the config file description for
details. If omitted, it is as if
-inspectors defaultInspectorshad been written. The
-exclude
option allows the
shortnames of inspectors to be excluded from the checks.
(eg, the type inspector currently slows things down quite
a lot and might well be excluded from an initial eyeballing.)
The eyeball reports are written to the standard output; by default, the reports appear as text (RDF rendered by omitting the subjects - which are all blank nodes - and lightly prettifying the predicate and object). To change the rendering style, supply the -render option with the name of the renderer as its value. Eyeball comes with N3, XML, and text renderers; the Eyeball config file associates renderer names with their classes.
To tests models from databases, or with attached reasoners, see the Jena Model Spec documentation; be warned that this code is changing at this time.
java jena.eyeball -assume -check myDataFile.rdf java jena.eyeball -assume dc -check http://example.com/nosuch.n3 java jena.eyeball -assume mySchema.rdf -check myData.rdf -render xml java jena.eyeball -check myData.rdf -inspectors defaultInspectors
To create an Eyeball on a particular schema, do:
Eyeball eyeball = new Eyeball( modelWithSchemaInIt );
If, instead of a single schema, you have several schemas bundled
together in a List L
, you can supply that list
as a SchemaList
:
Eyeball eyeball = new Eyeball( new SchemaList( L ) );
It is also possible to build a SchemaList
one
model at a time, since SchemaList
supports
an add(Model)
method.
All of these forms use the default list of inspectors. To supply a non-default list, use:
Eyeball eyeball = new Eyeball( inspectors, aSchemaList );
inspectors
must be a list of full classnames of
Inspector
classes. To make constructing this list
easier, you can use the method
List Eyeball::getInspectors( Model config, List inspect, List except )
inspect
is a list of shortnames of inspectors to
include, and except
is a list of shortnames of
inspectors to exclude, exactly as for the command-line options
(for which this method is the implementation). config
is the config model to use.
eyeball.inspect( modelToBeInspected )The result is an instance of EyeballReport. The model() method delivers an RDF model which describes the problems found by the inspection. The inspections supplied in the distribution use the EYE vocabulary, and are used in the standard reports:
unknown predicate | eye:unknownPredicate URI | the URI of the unknown predicate |
bad URI | eye:badURI String | the spelling of the bad URI |
illegal language code | eye:badLanguage String | the bad language code |
eye:onLiteral String | a plain literal with the same lexical form | |
bad datatype URI | eye:forReason URI | the reason URI |
eye:onLiteral String | a plain literal with the same lexical form | |
bad namespace URI | eye:onPrefix String | the prefix name with the bad namespace |
eye:forReason URI | the reason URI | |
eye:badNamespaceURI String | the spelling of the bad URI | |
Jena prefix found | eye:jenaPrefixFound String | the name of the Jena prefix |
eye:forNamespace | the namespace the prefix is bound to | |
implausible vocabulary item | eye:onResource URI | the URI of the implausible resource |
eye:notFromSchema URI | the URI of the schema | |
an undeclared class | eye:unknownClass | the resource that was presumed to be a Class |
an untyped Resource | eye:hasNoType Resource | the resource that has no rdf:type property |
inconsistent types for resource | eye:noConsistentTypeFor URI | the URI of the inconsistent resource |
eye:hasAttachedType URI | one of the given types that have no intersection | |
"wrong" number of property values for some subject | eye:cardinalityFailure | the subject for which the failure was detected |
eye:onProperty | the property P that has the wrong number of values | |
eye:onType | the cardinality-constrained type | |
eye:cardinality | a blank node [eye:min min; eye:max max] | |
eye:numValues | the number of values of P found | |
eye:values | a blank node of rdf:type eye:Set
with an rdfs:member value for each of the
values of P.
|
Every report item in the model is a blank node with
rdf:type eye:Item
.
The labels for the Eyeball predicates and reason messages are defined in the Eyeball schema file etc/eyeball-schema.n3 (and are used by the text renderer):
eye:uriContainsSpaces | the URI contains unencoded spaces, probably as a result of sloppy use of file: URLs. |
eye:uriFileInappropriate | a URI used as a namespace is a file: URI, which is inappropriate as a global identifier. |
eye:uriHasNoScheme | a URI has no scheme field, probably a misused relative URI. |
eye:schemeShouldBeLowercase | the scheme part of a URI is not lower-case; while technically correct, this is not usual practice. |
eye:uriFailsPattern | a URI fails the pattern appropriate to its schema (as defined in the configuration for this eyeball). |
eye:unrecognisedScheme | the URI scheme is unknown, perhaps a misplaced QName. |
eye:uriNoHttpAuthority | an http: URI has no authority (domain name/port) component. |
eye:uriSyntaxFailure | the URI can't be parsed using the general URI syntax, even with any spaces removed. |
eye:namespaceEndsWithNameCharacter | a namespace URI ends in a character that can appear in a name, leading to possible ambiguities. |
Configuration files loaded from the command line may contain statements
with the predicate eye:loadConfig
; their objects should
be strings (not URIs) naming other configuration files to
be loaded. This allows a user to extend the default configuration
without having to modify the default file.
Eyeball is also configured by the location-mapping file etc/location-mapping.n3. The Eyeball jar contains copies of both the default config and the location mapper; these are used by default. You can provide your own etc/eyeball-config.n3 file earlier on your classpath or in your current directory; this config replaces the default. You may provide additional location-mapping files earlier on your classpath or in your current directory.
A shortname can name several schemas. The Eyeball delivery has the short names rdf, rdfs, owl, and dc for the corresponding schemas (and mirror versions of those schemas so that they don't need to be downloaded each time Eyeball is run.)[] eye:shortName shortNameLiteral ; eye:schema fullSchemaURL ... .
eye:shortName
s
(supplied on the command line). Each such property value
must be a plain string literal whose value is the full name of
the Inspector class to load and run; see the Javadoc of Inspector
for details.
An inspector resource may refer to other inspector resources
to include their inspectors, using either of the two properties
eye:include
or eye:includeByName
.
The value of an include
property should be another
inspector resource; the value of an includeByName
property should be the shortName
of an inspector
resource.
[Two inspector resources may refer to each other, in which case they are equivalent.]
The inspectors provided in the Eyeball distribution are:
class leafname | eye:shortName | description |
LiteralInspector | literal | Checks literals for syntactically correct language codes. |
PredicateInspector | predicate | Checks that every predicate used is "declared" in some provided schema. |
PrefixInspector | prefix | Checks that prefix namespaces are well-formed and that well-known prefixes have their well-known URIs. |
JenaPrefixInspector | jena-prefix | Checks for namespace prefixes generated by Jena,
ie, those of the form j.N+ .
|
URIInspector | URI | Checks that every URI in the model is well-formed. |
VocabularyInspector | vocabulary | Checks that every URI in the model whose namespace is mentioned in some schema is one of the URIs declared in the schema. [This one needs a bit of work ...] |
AllTypedInspector | all-typed | checks that all URI and bnode resources in the model
have an rdf:type property in the model or the schema(s).
If there is a statement in the confiuration with
property eye:checlLiteralTypes and
value eye:true , also checks that every
literal has a type or a language. Not in the
default set of inspectors.
|
ConsistentTypeInspector | consistent-type | Checks that every subject in the model can be given a type which is the intersection of the subclasses of all its "attached" types. See below. |
PresumedClassInspector | presumed-class | Checks that every resource in the model that appears
as the object of an rdf:type ,
rdfs:domain , or rdfs:range statement,
or as the subject or object of an rdfs:subClassOf
statement, has been declared as a Class in
the schemas or the modeul under test. Note:
OWLification coming soon.
|
CardinalityInspector | cardinality | Looks for classes C that are subclasses of cardinality
restrictions on some property P with cardinality range
min to max. For any X of rdf:type
C, it checks that the number of values of P is
in the range min..max and geenrates a report if it isn't.
(Doesn't account for owl:sameAs in the 1.2 release.)
|
For example, if the model contains three types Top
,
Left
, and Right
, with Left
and Right
both being subtypes of Top
and with no other subclass statements, then some S
with rdf:type
s Left
and Right
would generate this warning.
The ConsistentTypeInspector must do at least some type inference.
This release
of Eyeball compromises by doing RDFS inference augmented by (very)
limited union and intersection reasoning, as described in the Jena
rules in etc/owl-like.rules
. Even so, doing type
inference over a large model is costly; you may wish to suppress it
with -exclude
until any other warnings are dealt with.
While, technically, a resource with no attached types at all is automatically inconsistent, Eyeball quietly ignores such resources, since they turn up quite often in simple RDF models.
Implementation note: The ConsistentTypeInspector's inferencing is done entirely by forward rules, triggered on the first subject to inspect. Once the rules have run to completion, further subjects are cheap. Using backward rules, the initial closure of the model was somewhat cheaper, but each new subject in a biggish took a long time - a second or so - to process.
eye:schemePattern
; their objects must be strings
which describe a legal (Java regex) pattern for a URI.
The scheme parts of those patterns form the set of known URI schemes:
a URI that has that scheme, but does not match any of the patterns
for that scheme, generates an eye:uriFailsPattern
report.
Eyeball forms a single |-separated Java regular expression from all the patterns sharing the same scheme part.The currently shipped config file restricts the type-id part of a URN to containing letters, digits, and hyphens, and to start with a letter.
The[] eye:renderer FullClassName ; eye:shortName ShortClassHandler
FullClassName
is a string literal giving the full
class name of the rendering class. That class must implement the
Renderer interface and have a constructor that takes a
single Model (the configuring model) as an argument.
The ShortClassHandle
is a string literal giving
the short name used to refer to the class. The default short name
used is default. There should be no more than one
eye:shortName statement with the same ShortClassHandle
in the configuation file, but the same class can have many different
short names.