Xerces2 Architecture

Table of Contents


Overview

The Xerces Native Interface (XNI) is a framework for communicating a "streaming" document information set and constructing generic parser configurations. XNI is part of the Xerces2 development but it is important to note that the Xerces2 parser is just a standards compliant reference implementation of the Xerces Native Interface. Other parsers can be written that conform to XNI without conforming to any particular standards.

Document Information

An XML parser can be viewed as a pipeline in which information flows from a scanner to a validator to the parser. In this pipeline, one component (the scanner) acts as a source of events; the final component (the parser) is the final target of the events; and any components between the source and target are known as filters. Filter components are both targets for the information sent by the previous component in the pipeline and sources for the information that the filter chooses to propagate to the next component in the pipeline. The following diagram illustrates the layout of the pipeline in this kind of parser.

XML
Document
--> Scanner --> Validator --> Parser --> Application
API

Parsing of DTDs can also be viewed as a pipeline. Since the DTD is referenced in the document instance by XML syntax (the DOCTYPE declaration), the DTD pipeline is triggered by the document scanner. This contrasts with XML Schema because there is no XML syntax that associates a Schema grammar with a document; a special attribute in the document instance is used as a hint to the location of the grammar. The following diagram illustrates the layout of the DTD pipeline.

DTD
Document
--> DTD
Scanner
--> Validator --> Parser --> Application
API
 
--> DTD
Grammar

Note that the DTD scanner communicates directly with the validator. The validator receives the callbacks from the DTD scanner in order to create and populate the DTD grammar object. In this way, the validator acts as a "tee", propogating the DTD events to both the next stage in the pipeline and the DTD grammar object. This allows the validation stage in the pipeline to be completely removed from the parser configuration, if needed.

The XML document information is defined by the XMLDocumentHandler interface and the DTD information is defined by the XMLDTDHandler and XMLDTDContentModelHandler interfaces. (Note: As of 10 Apr 2001, the DTD interfaces are subject to change based on user feedback.) This set of interfaces and supporting interfaces and classes comprise the XNI Core. However, whereas the XNI Core defines what information document and DTD is communicated but does not define the semantics for configuring the parser pipeline.

Parser Configuration

In the XNI world, a parser object used by an application is merely an API generator (e.g. building DOM trees or calling SAX handlers). The components and configuration information for that parser is defined within a parser configuration object. With this approach, different parser configurations can be used with the existing parser instances without duplicating code.

The parser configuration object, defined by the XMLParserConfiguration interface, that is used by the application is comprised of a series of components. The parser configuration assembles the parsing pipeline components, transmits settings to each component, and controls their actions. The following diagram shows a general parser configuration and its components. (No ordering or direct connection between components should be implied.)

Parser Configuration
Symbol
Table
Grammar
Pool
Datatype
Validator
Factory
Error
Reporter
Entity
Manager
Document
Scanner
DTD
Scanner
Validator

The workings of the parser configuration object are unknown to the parser. The parser is only able to set features and properties on the configuration, set the XNI handlers to receive the document information, and initiate a parse. Typically the parser object itself will be registered as the target of XNI events produced from the parser configuration when a document is parsed, but it doesn't have to be. The following diagram illustrates this situation.

Parser
Parser Configuration Pipeline
Scanner --> Validator -->
--> DOM
Parser
--> SAX
Parser

Features & Properties

Features and properties are provided via the extensible mechanism found in SAX2. Features are boolean settings on the parser configuration while properties are object settings. There are a number of SAX2 core features and properties but XNI parser components are free to define new ones. All of the features and properties are managed by the parser configuration, though.

TODO: Expand on how features and properties are set, when, and by who.

Settings Management

The parser configuration implements the XMLComponentManager interface and each component implements the XMLComponent interface. For this configuration system to work, the parser configuration must adhere to the following guidelines:


Author: Andy Clark
Last modified: $Date$