Gloze Documentation

gloze : to make explanatory notes or glosses on a text

Gloze is a tool for mapping between XML and RDF; describing the content of an XML document. It may be used to:

generate an RDF description of an XML document (using the XML schema)
generate an XML document from its RDF description (using the XML schema)
generate an OWL ontology from the XML schema.

Gloze provides an alternative to hand-crafted XSLTs for translating XML into RDF/XML. Furthermore, the Gloze mapping is reversible so that an XML document may be mapped into RDF then back into XML with minimal loss of information. Key to this mapping is the XML schema, which provides additional type and compositional information not available in the source XML.

The concept behind Gloze is to make this mapping as unsurprising as possible, while avoiding the introduction of new vocabulary. Put simply, XML schema types, both simple and complex, map to OWL classes, while elements and their attributes map to properties. Any given instance of an XML attribute or element maps to an RDF statement. The content of the element or attribute is the object of the statement, which may be a literal or an RDF resource with its own properties.

The translation of schema into OWL is simplified by the fact that OWL uses the majority of XML schemas predefined data-types. The RDF semantics recommendation identifies a subset of these data-types suitable for use in RDF typed literals. Exceptions include: anySimpleType, duration, ENTITY, ENTITIES, ID, IDREF, IDREFS, NMTOKENS, NOTATION, QName. The remaining data-types are translated as-is. The tree structure of the XML translates almost directly into RDF. Intermediate nodes are bnodes unless they represent the document element (with the document base as its URI) or if they have an xs:ID identifier.

Qualified attributes and elements are declared in the target namespace of the schema, or in the default namespace if unqualified. All types are defined in the target namespace of the schema, or in the case of a no-namespace schema we must give a default namespace. An element, attribute or type is unqualified if no target namespace is defined, or if it is defined locally and its form is unqualified. Unqualified attributes/elements are not defined in the target namespace, but the translation requires an absolute URI; an additional default namespace must be supplied. In xml schema, attributes and elements have their own symbol spaces, distinct from each other and the types. If there is a namespace clash between these symbol spaces, it is advisable to introduce extra symbolic prefixes, appended to the target namespace, to keep them distinct. We have to be careful to distinguish the name of the type from the identity of the schema component that defines it. If we need to refer to the schema component, to use its additional definitional machinery, we may use its id (relative to the schema base), or a schema component identifier.

The names of locally defined attributes & elements can be recycled in different type definitions (different particles), each time with a different type. From the perspective of XML schema they are different properties, but rather than trying to construct an elaborate and obscure naming scheme that keeps them apart, we take the view that these are different uses of the same property. This means that a property may refer to a data-type in one type, but to an object in another. There are therefore no guarantees that the translation will be in OWL DL; a given property could be both an object and a data-type property. If OWL DL is a desirable outcome then it is up to the schema author to come up with a clean design (union types raise similar design challenges). We can't assume that the schema as a whole prescribes all uses of that name because it can always be included in another unidentified schema that recycles the name again. We assume that globally defined attributes & elements are not recycled in this way (name clashes notwithstanding); and that all occurrences conform to this global declaration. This enables us to derive property ranges directly from global attribute & element declarations.

Where the lexical ordering of the children is significant, this is captured in an RDF sequence of reified statements. This is simply overlaid over the existing tree structure, so you can take it or leave it. Queries to the RDF that aren't interested in ordering can simply ignore it. As for OWL modelling, sequencing is regarded as a data-structuring issue rather than one of ontologogical significance.

Element groups and attribute groups are treated as syntactic sugar rather than fully fledges classes, so are flattened out of the OWL mapping. As indeed are sequence, choice and all compositors which are only used to calculate the cardinalities of their respective content. They don't add structure.

There are, of course, a few wrinkles caused by mixed and nillable content, and we will see how the non-recommended datatypes map into RDF. These issues are covered in the relevant documentation.

Generated on Mon Jun 18 16:02:37 2007 for Gloze by

1.5.0