http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Xerces Project

Overview
Charter
Release Info
Advisories
Download

Xerces-C++ 3.3.0
Installation
Build Instructions

Programming
Samples
FAQs

API Reference
DOM C++ Binding
Migration Guide

Feedback
Bug-Reporting
Mailing Lists

Source Repository
Applications

Migrating to earlier Releases
 

Migrating from Xerces-C++ 3.1.3 to Xerces-C++ 3.1.4
 

Xerces-C++ 3.1.4 is a bugfix-only release and is binary-compatible with Xerces-C++ 3.1.3.


Migrating from Xerces-C++ 3.1.2 to Xerces-C++ 3.1.3
 

Xerces-C++ 3.1.4 is a bugfix-only release and is binary-compatible with Xerces-C++ 3.1.3.


Migrating from Xerces-C++ 3.1.1 to Xerces-C++ 3.1.2
 

Xerces-C++ 3.1.2 is a bugfix-only release and is binary-compatible with Xerces-C++ 3.1.1.


Migrating from Xerces-C++ 3.1.0 to Xerces-C++ 3.1.1
 

Xerces-C++ 3.1.1 is a bugfix-only release and is binary-compatible with Xerces-C++ 3.1.0.


Migrating from Xerces-C++ 3.0.1 to Xerces-C++ 3.1.0
 

The following section is a discussion of the technical differences between Xerces-C++ 3.0.1 and Xerces-C++ 3.1.0.

Topics discussed are:

New features in Xerces-C++ 3.1.0
 
  • Working multi-import support. The support for handling multiple import declarations with the same target namespaces has been improved and thoroughly tested. Furthermore, the same logic was extended to loadGrammar and the schemaLocation attributes so that you can load several schemas with the same namespace and/or "add" more declarations with the schemaLocation attributes. To enable this feature, set the XMLUni::fgXercesHandleMultipleImports feature/parameter to true. Starting with this release all the tests and examples have multi-import support enabled by default.
  • New property, XMLUni::fgXercesLowWaterMark, allows to configure the parser buffer low water mark. In particular, setting this value to 0 disables data caching in the parser which can be useful if you want the SAX events to be dispatched as soon as the data is available.
  • DOMLSParser::parseWithContext implementation. In particular, this functionality allows one to parse a document fragment with missing namespace declarations as long as the context document provides them.
  • Improved performance and reduced memory footprint when validating with large maxOccurs values. If available, the SSE2 instructions are used to further speedup this case.
  • Improved scalability of the XML Schema identity checking (key, keyref, and unique).
  • Multiple XML Schema conformance fixes.
  • More robust external library detection (libcurl and ICU). In particular, the build system no longer tries to inject any additional paths such as /usr or /usr/local.
  • Compilation of the ICU message loader resources no longer depends on the ICU implementation details.

Public API Changes
 

Xerces-C++ 3.1.0 is a minor release and does not include any public API changes that would preclude applications using the previous version of Xerces-C++ from building successfully with this version.



Migrating from Xerces-C++ 3.0.0 to Xerces-C++ 3.0.1
 

Xerces-C++ 3.0.1 is a bugfix-only release and is binary-compatible with Xerces-C++ 3.0.0.


Migrating from Xerces-C++ 2.8.0 to Xerces-C++ 3.0.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.8.0 and Xerces-C++ 3.0.0.

Topics discussed are:

New features in Xerces-C++ 3.0.0
 
  • Autotools-based build system for the UNIX/Linux/Mac OS X platforms
  • Project files for VC++ 9
  • Support for the ICU transcoder in VC++ 7.1, 8, and 9 project files
  • libcurl-based net accessor
  • Support for XInclude in DOM
  • Support for both XPath 1 and XPath 2 models in the DOM XPath interface
  • Support for the XML Schema subset of XPath 1 in DOM
  • Conformance to the final DOM Level 3 interface specification
  • Ability to provide custom DOM memory manager as well as tune the global DOM heap parameters
  • All public and widely used interfaces as well as a large portion of the implementation were converted to be 64-bit safe.
  • Various XML Schema fixes including the fix for the large maxOccurs and minOccurs bug as well as for the changed ##other interpretation
  • Reviewed and cleaned up diagnostics messages
  • Optimizations for SAX/SAX2 and DOM parsing as well as XML Schema validation

Public API Changes
 

Xerces-C++ 3.0.0 is a major release and includes a number of application-breaking interface changes compared to Xerces-C++ 2 series. The following sub-sections provide an overview of the public API changes between Xerces-C++ 2 series and this release.

New Public APIs
 
  • XMLGrammarPoolImpl implementation has been moved to framework/ and is now publicly accessible
  • DOM XPath interfaces now support XPath 2 model
  • A number of DOM interfaces (DOMLSInput, DOMLSOuput, DOMLSParser, DOMLSSerializer, DOMConfiguration, etc.) were added as part of the the final DOM Level 3 specification conformance work

Modified Public APIs
 

A large number of public APIs have been modified. Consult individual interface documentation for details. The following list gives an overview of major changes:

  • Several DOM interfaces have been adjusted to conform to the final DOM Level 3 specification
  • DOM XPath interfaces have been adjusted to support both XPath 1 and XPath 2
  • Many public interfaces that used int/long types to represent memory-related sizes, counts, indexes, etc., have been modified to use the 64-bit safe XMLSize_t type instead

Deprecated/Removed Public APIs
 

All APIs marked as deprecated in Xerces-C++ 2 series have been removed in this release. In particular deprecated DOM (depdom) as well as COM support have been removed.

Furthermore, a number of DOM interfaces (DOMBuilder, DOMWriter, DOMInputSource, etc.) were replaced as part of the the final DOM Level 3 specification conformance work.




Migrating from Xerces-C++ 2.7.0 to Xerces-C++ 2.8.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.7.0 code base and the Xerces-C++ 2.8.0.

Topics discussed are:

New features in Xerces-C++ 2.8.0
 
  • Exponential growth of memory block (from 16KB to 128KB) that are allocated by the DOM heap.
  • The NODE_CLONED notification is now sent to each node's user data handler when cloning the entire DOMDocument.
  • On Windows extract the registry code page from MIME\Database\Charset\<encoding>\@InternetEncoding instead of MIME\Database\Charset\<encoding>\@Codepage.
  • Allow whitespace-only nodes to be added as children of a DOMDocument.
  • When a node is cloned or imported the type information (PSVI) is also copied.
  • When using SAX2, including XMLReaderFactory to use createXMLReader doesn't include xercesc/parsers/SAX2XMLReaderImpl.hpp anymore. If you need to cast the SAX2XMLReader to SAX2XMLReaderImpl, you need to include this header yourself.


Migrating from Xerces-C++ 2.6.0 to Xerces-C++ 2.7.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.6.0 code base and the Xerces-C++ 2.7.0.

Topics discussed are:

New features in Xerces-C++ 2.7.0
 
  • Feature to not generate XML Schema annotations. That is, not to add them to the grammar. If you don't need annotations you may want to turn on this feature to avoid memory bloat for XML Schemas that use annotations heavily.
  • Option to skip regenerating the XML Schema annotations when deserializing a grammar. If you don't need annotations you may want to turn on this option to avoid memory bloat for XML Schemas that use annotations heavily.
  • Feature to not perform default entity resolution. When the entityResolver returns NULL, the parser doesn't try to resolve the entity externally.
  • Feature to do schema-only validation even if there is a DTD.

Public API Changes
 

The following lists the public API changes between the Xerces-C++ 2.6.0; and the Xerces-C++ 2.7.0 releases of the parser.

New Public API
 
  • XMLString: subString, compareIStringASCII, lowercaseASCII, uppercaseASCII
  • RefHash2KeysTableOf: rehashing support
  • XMemory: placement new and delete
  • SAX2XMLFilter

Modified Public API
 

Deprecated/Removed Public API
 



Migrating from Xerces-C++ 2.5.0 to Xerces-C++ 2.6.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.5.0 code base and the Xerces-C++ 2.6.0.

Topics discussed are:

  • New features in Xerces-C++ 2.6.0
  • Public API Changes
    • New Public API
    • Modified Public API
    • Deprecated/Removed Public API
    • Please note the following source code incompatibility: Rename VALUE_CONSTRAINT enumeration values in XSConstants.hpp due to conflict with system header. New values are VALUE_CONSTRAINT_NONE, VALUE_CONSTRAINT_DEFAULT and VALUE_CONSTRAINT_FIXED. Old values are VC_NONE, VC_DEFAULT and VC_FIXED.
    • Also note that if you have implemented your own XMLGrammarPool implementation, that the original getXSModel has been marked deprecated and a new method, of the same name, that takes a bool parameter has been added. This new getXSModel needs to always return an XSModel.
New features in Xerces-C++ 2.6.0
 
  • Reduce footprint of DLL by building the deprecated DOM as a separate library
  • Improve packaging scripts
  • Enable ID's to work on all kinds of schema components
  • Add messages to DOMExceptions along with the error code
  • Improve annotation error-reporting capabilities
  • Make grammar caching work with DTD internal subsets
  • Bring parser up to the XML 1.0 3rd Edition
  • Update to the XML 1.1 recommendation
  • Add new method to DOMDocument so that DOM level-2 style DOMDocumentTypes (which have a DOMDocument to own them) can be created
  • Feature for disabling identity constraints
  • Update schema errata
  • Provide means to get actual values out of PSVI/schema component model
  • Synthesize annotation components for non-schema attributes
  • Expose partial PSVIElement information at the start element call
  • Externalize validation, actual/canonical value production for arbitrary strings
  • Laxly validate schema annotations
  • Upgrade to ICU 3.0
  • Handle elements with a large number of attributes more efficiently

Public API Changes
 

The following lists the public API changes between the Xerces-C++ 2.5.0; and the Xerces-C++ 2.6.0 releases of the parser.

New Public API
 
  • XSValue
  • IdentityConstraintHandler
  • XMLBufferFullHandler
  • XMLString: removeChar, isValidNOTATION
  • XMLUri: normalizeUri
  • PSVIHandler: handlePartialElementPSVI
  • RefHash family of classes: getHashModulus
  • XSAnnotation: setline/col and systemid
  • XMLReader: handeEOL
  • XMLChar: isValidNmToken
  • XMLBigDecimal: parseDecimal, getIntVal
  • HexBin: getCanonicalRepresentation, decode
  • Base64: getCanonicalRepresentation, decode
  • XMLBigInteger: compareValues
  • XMLAbstractDoubleFloat: isDataConverted, getValue, isDataOverFlowed
  • PSVIItem: getActualValue
  • XSSimpleTypeDefinition: getDatatypeValidator
  • RefHash2KeysTableOf: transferElement
  • XMLGrammarPool: getXSModel

Modified Public API
 
  • XSerializeEngine constructor
  • MACUnicodeConverters

Deprecated/Removed Public API
 
  • XSerializeEngine constructor
  • DTDAttDef: getDOMTypeInfoName, getDOMTypeInfoUri
  • DTDElementDecl: getDOMTypeInfoName, getDOMTypeInfoUri
  • SchemaAttDef: setAnyDatatypeValidator
  • UnionDatatypeValidator: getMemberTypeName, getMemberTypeUri, getMemberTypeAnonymous, getMemberTypeValidator
  • XMLAttr: getValidatingTypeURI, getValidatingTypeName, setDatatypeValidator, setSchemaValidated
  • ComplexTypeInfo: setContentModel
  • XMLGrammarPool: getXSModel
  • SAXParser, mark this class deprecated



Migrating from Xerces-C++ 2.4.0 to Xerces-C++ 2.5.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.4.0 code base and the Xerces-C++ 2.5.0.

Topics discussed are:

New features in Xerces-C++ 2.5.0
 
  • Fix duplicate attribute detection when namespaces are disabled
  • Stricter use of static memory manager for static data only
  • PSVI bug fix and enhencement
  • ThreadTest with grammar caching
  • Re-pluggable Panic Handler
  • Enhenced mutex creation to impove thread safety
  • Intrinsic transcoding support for 390.
  • Canonical Representation Support
  • New sample SCMPrint
  • New sample PSVIWriter
  • New test XSerializerTest

Public API Changes
 

The following lists the public API changes between the Xerces-C++ 2.4.0; and the Xerces-C++ 2.5.0 releases of the parser.

New Public API
 

Modified Public API
 

Deprecated/Removed Public API
 



Migrating from Xerces-C++ 2.3.0 to Xerces-C++ 2.4.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.3.0 code base and the Xerces-C++ 2.4.0.

Topics discussed are:

New features in Xerces-C++ 2.4.0
 
  • PSVI
  • Performance enhancement
  • Stateless Grammar
  • Grammar Serialization/Deserialiation

Public API Changes
 

The following lists the public API changes between the Xerces-C++ 2.3.0; and the Xerces-C++ 2.4.0 releases of the parser.

New Public API
 
  • PSVI related
  • Grammar serialization/deserialization related

Modified Public API
 

Deprecated/Removed Public API
 
  • XMLAttDef: getProvided, getDOMTypeInfoUri, getDOMTypeInfoName, setProvided
  • XMLAttDefList: hasMoreElements, nextElement, Reset
  • DTDAttDefList: hasMoreElements, nextElement, Reset
  • SchemaAttDefList: hasMoreElements, nextElement, Reset
  • XMLElementDecl: LookupOpts
  • XMLNumber family: toString
  • ENTITYDatatypeValidator: setEntityDeclPool
  • IDDatatypeValidator: setIDRefList
  • IDREFDatatypeValidator: setIDRefList
  • GeneralAttributeCheck: setIDRefList
  • SchemaGrammar: getIDRefList
  • SchemaElementDecl: all non thread safe methods
  • SchemaAttDef: getters
  • DTDGrammar: getRootElemId



Migrating from Xerces-C++ 2.2.0 to Xerces-C++ 2.3.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.2.0 code base and the Xerces-C++ 2.3.0.

Topics discussed are:

New features in Xerces-C++ 2.3.0
 
  • Experimental Implementation of Namespaces in XML 1.1
  • Experimental Implementation of XML 1.1: in DOMWriter
  • More Schema 1.0 Errata Implementation
  • More DOM L3 Core Support
    • DOMConfiguration
    • Document Normalization
  • Plugable Memory Manager
  • Plugable Security Manager
  • Plugable Panic Handler
  • Logical Path Resolution

Public API Changes
 

The following lists the public API changes between the Xerces-C++ 2.2.0; and the Xerces-C++ 2.3.0 releases of the parser.

New Public API
 
  • To support additional DOM L3 functions, the following are added:
  • DOMDocument: getDOMConfiguration
  • DOMConfiguration class for document normalization.

Modified Public API
 

Deprecated/Removed Public API
 
  • DOMDocument canSetNormalizationFeature, setNormalizationFeature, getNormalizationFeature, getErrorHandler, setErrorHandler removed



Migrating from Xerces-C++ 2.1.0 to Xerces-C++ 2.2.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.1.0 code base and the Xerces-C++ 2.2.0.

Topics discussed are:

New features in Xerces-C++ 2.2.0
 
  • C++ Namespace Support
  • Schema 1.0 Errata Implementation
  • Experimental Implementation of XML 1.1
  • More DOM L3 Core Support:
    • DOMNode: baseURI
    • DOMAttr: isId, getTypeInfo
    • DOMElement: setIdAttribute, setIdAttributeNS, setIdAttributeNode, getTypeInfo
  • DOM Message: make use of the non-standard extension DOMImplementation::loadDOMExceptionMsg to load the default error text message for the correspond Exception Code.
  • New feature XMLPlatformUtils::Initialize(const char* const locale) to set the locale for message loader. See Specify locale for Message Loader for details
  • Support Build with ICU Message Loader, or Message Catalog Message Loader
  • RPM for Linux
  • 390: Uniconv390 support
  • 390: support record-oriented MVS datasets with the DOM Level 3 serialization APIs
  • Support for Linux/390
  • Performance: Break Scanner for different functionalities and many other performance improvement
  • New feature, "http://apache.org/xml/features/dom/byte-order-mark", allows user to enable DOMWriter to write Byte-Order-Mark in the output XML stream, See Xercesc Feature: Byte Order Mark for details

Using C++ Namespace
 

Xerces-C++ 2.2.0 now supports C++ Namespace. All Xerces-C++ classes, data and variables are defined in the xercesc namespace if C++ Namespace support is ENABLED.

All the binary distributions of Xerces-C++ 2.2.0 are now built with C++ Namespace enabled. Therefore users' applications that links with the distributed binary packages must namespace qualify all the Xerces-C++ classes, data and variables.

See the Programming Guide Using C++ Namespace for details.


Public API Changes
 

The following lists the public API changes between the Xerces-C++ 2.1.0; and the Xerces-C++ 2.2.0 releases of the parser.

New Public API
 
  • To support additional DOM L3 functions, the following are added:
    • DOMAttr: isId, getTypeInfo
    • DOMElement: setIdAttribute, setIdAttributeNS, setIdAttributeNode, getTypeInfo
    • Added DOMTypeInfo class for getTypeInfo class in DOMElement and DOMAttr
    • Added getDOMTypeInfoUri, getDOMTypeInfoName to XMLAttDef and XMLElementDecl for use in building DOMTypeInfo
  • Added a non-standard extension DOMImplementation::loadDOMExceptionMsg to load the default error message for the corresponding DOMException code.
  • XMLAttr: Added a constructor and a set method to allow creating/setting of XMLAttr using a rawname.
  • Added XMLUri::getUriText to return the URI as a string specification.
  • Add XMLString::fixURI to transform an absolute path filename to standard URI form.
  • Added XMLString::equals for faster string comparison.
  • To allow users to tell the parser to force standard uri conformance, the following are added:
    • XercesDOMParser/DOMParser/SAXParser: get/setStandardUriConformant
    • and DOMBuilder/SAX2XMLReader will recognize the feature http://apache.org/xml/features/standard-uri-conformant
  • Add XMLURL::hasInvalidChar() to indicate if the URL has invalid char as per RFC standard
  • To allow users to enable/disable src offset calculation, the following are added:
    • XercesDOMParser/DOMParser/SAXParser: get/setCalculateSrcOfs
    • and DOMBuilder/SAX2XMLReader will recognize the feature http://apache.org/xml/features/calculate-src-ofst
  • To allow users to select the scanner when scanning XML documents, the following are added:
    • XercesDOMParser/DOMParser/SAXParser: useScanner
    • and DOMBuilder/SAX2XMLReader will recognize the property http://apache.org/xml/properties/scannerName
  • Added getSrcOffset to XercesDOMParser/DOMParser/SAXParser/DOMBuilder/SAX2XMLReader to allow users to get the current src offset within the input source.

Modified Public API
 
  • The following DOM functions are being added a const modifier.
    • DOMImplementation::hasFeature
    • DOMNode: isSameNode, isEqualNode, compareTreePosition
  • XMLPlatformUtils::Initialize() takes a parameter specifying locale for message loader, with default value "en_US".
  • To fix [Bug 13641], the QName copy constructor is corrected to take a reference as parameter, i.e. QName(const QName& qname).
  • To fix [Bug 12232], the QName operator== has been added a const modified.
  • Move XMLUri copy constructor and operator = as public.
  • Move XMLUri::isURIString as public.
  • For validation purpose, added two more default parameters to XMLValidator::validateAttrValue.
  • To fix [Bug 15802], the getURIText of DOMParser/XercesDOMParser/SAXParser/SAX2XMLReader are being added a const modifier.

Deprecated/Removed Public API
 
  • No Deprecated Public API in this release.



Migrating from Xerces-C++ 2.0.0 to Xerces-C++ 2.1.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.0.0 code base and the Xerces-C++ 2.1.0.

Topics discussed are:

New features in Xerces-C++ 2.1.0
 
  • 64 bit binaries distribution on Windows IA64 and Linux IA64
  • Support for Cygwin environment
  • DOM Level 3 DOMNode: compareTreePosition, lookupNamespaceURI, lookupNamespacePrefix and isDefaultNamespace
  • plus many more bug fixes

Public API Changes
 

The following lists the public API changes between the Xerces-C++ 2.0.0; and the Xerces-C++ 2.1.0 releases of the parser.

New Public API
 
  • To fix bug 7087, XMLEnumerator is added a virtual destructor.
  • To fix bug 11448, XMLNotationDecl::get/setBaseURI, and XMLEntityDecl::get/setBaseURI are added.

Modified Public API
 
  • DOMNodeList: item, and getLength have been added a const modifier.
  • DOMNode: lookupNamespacePrefix, isDefaultNamespace, and lookupNamespaceURI have been added a const modifier.

Deprecated/Removed Public API
 
  • No Deprecated Public API in this release.



Migrating from Xerces-C++ 1.7.0 to Xerces-C++ 2.0.0
 

The following section is a discussion of the technical differences between Xerces-C++ 1.7.0 code base and the Xerces-C++ 2.0.0.

Topics discussed are:

New features in Xerces-C++ 2.0.0
 
  • 64 bit binaries distribution
  • Follow Unix Shared Library Naming Convention
  • Apache Recommended DOM C++ Binding
  • Experimental DOM Level 3 subset support, including DOMWriter and DOMBuilder
  • Grammar preparsing and Grammar caching
  • Optionally ignore loading of external DTD
  • Project files for Microsoft Visual C++ .Net
  • Codewarrior 8 support
  • Option to enable/disable strict IANA encoding name checking
  • plus many more bug fixes and performance enhancement

Unix Library Name Change
 

The Xerces-C++ UNIX Library now follows the Unix Shared Library Naming Convention (libname.so.soname).


DOM Reorganization
 

1. The old Java-like DOM is now deprecated, and all the associated files, including the headers and DOMParser files are moved to src/xercesc/dom/deprecated. Users of the old Java-like DOM are required to change all their #include lines to pick up the headers. For example

//old code
#include <xercesc/dom/DOM.hpp>
#include <xercesc/dom/DOM_Document.hpp>
#include <xercesc/parsers/DOMParser.hpp>

void test(char* xmlFile) {
    DOMParser parser;
    parser.parse(xmlFile);
    DOM_Document doc = parser.getDocument();
    :
    return;
}

should now change to

//new code
#include <xercesc/dom/deprecated/DOM.hpp>          //<==== change this include line
#include <xercesc/dom/deprecated/DOM_Document.hpp> //<==== change this include line
#include <xercesc/dom/deprecated/DOMParser.hpp>    //<==== change this include line

// the rest is the same
void test(char* xmlFile) {
    DOMParser parser;
    parser.parse(xmlFile);
    DOM_Document doc = parser.getDocument();
    :
    return;
}

2. The Experimental IDOM is now renamed, and becomes the Apache Recommended DOM C++ Binding. The following changes are made:

  • class names are renamed from IDOM_XXXX to DOMXXXX, e.g. IDOM_Document to DOMDocument
  • and thus header files are renamed from IDOM_XXXX.hpp to DOMXXXX.hpp and are moved to src/xercesc/dom
  • the IDOMParser is renamed to XercesDOMParser. And thus the header file is renamed as well
  • the rest is the same, see Apache Recommended DOM C++ binding and DOM Programming Guide for more programming information

Users of IDOM are required to change all their #include lines and do a global rename of IDOMParser to XercesDOMParesr, and IDOM_XXXX to DOMXXXX. For example

//old code
#include <xercesc/idom/IDOM.hpp>
#include <xercesc/idom/IDOM_Document.hpp>
#include <xercesc/parsers/IDOMParser.hpp>

void test(char* xmlFile) {
    IDOMParser parser;
    parser.parse(xmlFile);
    IDOM_Document* doc = parser.getDocument();
    :
    return;
}

should now change to

//new code
#include <xercesc/dom/DOM.hpp>                  //<==== change this include line
#include <xercesc/dom/DOMDocument.hpp>          //<==== change this include line
#include <xercesc/parsers/XercesDOMParser.hpp>  //<==== change this include line

void test(char* xmlFile) {
    XercesDOMParser parser;                           //<==== rename the IDOMParser
    parser.parse(xmlFile);
    DOMDocument* doc = parser.getDocument();          //<==== rename the IDOM_XXXX
    :
    return;
}

Reuse Grammar becomes Grammar Caching
 

The Xerces-C++ 2.0.0 extends the "Reuse Grammar" support by replacing it with a new feature called "Grammar Caching" which provides more flexibility in reusing grammars. Users who used to do the following:


      XercesDOMParser parser;

      // this is the first parse, just usual code as you do normal parse
      // "firstXmlFile" has a grammar (schema or DTD) specified.
      parser.parse(firstXmlFile);

      // this is the second parse, by setting second parameter to true,
      // the parser will reuse the grammar in the last parse
      // (i.e. the one in  "firstXmlFile")
      // to validate the second "anotherXmlFile".  Any grammar that is
      // specified in anotherXmlFile is IGNORED.
      //
      // Note: The anotherXmlFile cannot have any DTD internal subset.
      parser.parse(anotherXmlFile, true);

should now use the features cacheGrammarFromParse and useCachedGrammarFromParse:

      XercesDOMParser parser;

      // By setting cacheGrammarFromParse to true,
      // the parser will cache any grammars encountered in the
      // follow-on xml files, if not cached already
      parser.cacheGrammarFromParse(true);

      parser.parse(firstXmlFile);

      // By setting useCachedGrammarFromParse to true,
      // the parser will use all the previous cached grammars
      // to validate the follow-on xml files if the cached
      // grammar matches the one specified in anotherXmlFile.
      //
      // Note: The follow-on xml files cannot have any DTD internal subset.
      parser.useCachedGrammarFromParse(true);

      parser.parse(anotherXmlFile);

      // This will flush the cached grammar pool
      parser.resetCachedGrammarPool();

Note there are a number of differences between "Reuse Grammar" and "Grammar Caching"

  1. "Reuse Grammar" ignores any grammar that is specified in anotherXmlFile and simply reuse whatever stored in previous parse; while "Grammar Caching" will use the cached grammar only if it matches the one specified in the anotherXmlFile. If not match, then the new grammar is parsed.
  2. "Reuse Grammar" can only reuse the grammar from previous parse; while "Grammar Caching" can selectively cache many grammars from different parses and collect them all in a pool indexed by targetNamespace (for Schema) or system id (for DTD).
  3. Plus "Grammar Caching" has much more functionalities other than above (like "Pre-parsing Grammar"). Please refer to Preparsing Grammar and Grammar Caching for more programming details.

Public API Changes
 

The following lists the public API changes between the Xerces-C++ 1.7.0; and the Xerces-C++ 2.0.0 releases of the parser.

New Public API
 
  • To support DOM Level 3, the following are added (see the API documentation page for details).
    • DOMNode functions set/getUserData, isSameNode isEqualNode.
    • DOMDocument functions renameNode, get/setActualEncoding, get/setEncoding, get/setVersion, get/setStandalone, get/setDocumentURI.
    • DOMEntity functions get/setActualEncoding, get/setEncoding, get/setVersion.
    • classes AbstractDOMParser, DOMError, DOMErrorHandler, and DOMLocator.
    • classes DOMUserDataHandler, DOMImplementationRegistry and DOMImplementationSource.
    • classes DOMBuilder, DOMEntityResolver, DOMImplementationLS, DOMInputSource, Wrapper4DOMInputSource and Wrapper4InputSource.
    • classes DOMWriter, DOMWriterFilter, LocalFileFormatTarget, StdOutFormatTarget, and MemBufFormatTarget
  • To support DOMWriter, the following PlatformUtils functions are added
    • openFileToWrite, writeBufferToFile
  • To have Apache Recommended DOM C++ Binding, the following are added (see Apache Recommended DOM C++ binding).
    • function release() to fix Memory Management problem
    • classes DOMDocumentRange and DOMDocumentTraversal
    • XMLSize_t is used to represent unsigned integral type in DOM
    • IDOM_XXXX classes are renamed to DOMXXXX, and IDOMParser is renamed to XercesDOMParser as described in DOM Reorganization
    • XercesDOMParser::adoptDocument is added so that document can optionally live outside the parser.
  • To support optionally load external DTD, the following are added:
    • XercesDOMParser::set/getLoadExternalDTD
    • DOMParser::set/getLoadExternalDTD
    • SAXParser::set/getLoadExternalDTD
    • and SAX2XMLReader will recognize the feature http://apache.org/xml/features/nonvalidating/load-external-dtd
  • To support Preparsing Grammar and Grammar Caching, the following are added:
    • XercesDOMParser/DOMParser/SAXParser functions loadGrammar, resetCachedGrammarPool, cacheGrammarFromParse, isCachingGrammarFromParse, useCachedGrammarInParse, isUsingCachedGrammarInParse.
    • SAX2XMLReader functions loadGrammar, resetCachedGrammarPool, and will recognize the features http://apache.org/xml/features/validation/cache-grammarFromParse and http://apache.org/xml/features/validation/use-cachedGrammarInParse.
  • To support access to Grammar info, the following are added:
    • XercesDOMParser/DOMParser/SAXParser/SAX2XMLReader functions getRootGrammar, getGrammar, getURIText.
  • To support strict IANA encoding name checking, the following are added:
    • class EncodingValidator.
    • PlatformUtils functions strictIANAEncoding, isStrictIANAEncoding.
    • XMLTransService functions strictIANAEncoding, isStrictIANAEncoding.

Modified Public API
 
  • SAXParser::getScanner() is moved from public to protected.
  • Grammar::getGrammarType has been added a const modifier.
  • Xerces features are renamed from XMLUni::fgSAX2XercesXXXX to XMLUni::fgXercesXXXX so that they can be shared with DOM parser.
  • With the new Grammar Caching introduced, the the last parameter "reuseGrammar" in the following API is dropped. Users should now use the "Grammar Caching" feature as described in Reuse Grammar becomes Grammar Caching.
    • (in Parser, SAXParser, DOMParser, and XercesDOMParser)
    • parse(const InputSource& source, const bool reuseGrammar = false);
    • parse(const XMLCh* const systemId, const bool reuseGrammar = false);
    • parse(const char* const systemId, const bool reuseGrammar = false);
    • (in SAXParser, DOMParser, and XercesDOMParser)
    • parseFirst(const InputSource& source, XMLPScanToken& toFill, const bool reuseGrammar = false);
    • parseFirst(const XMLCh* const systemId, XMLPScanToken& toFill, const bool reuseGrammar = false);
    • parseFirst(const char* const systemId, XMLPScanToken& toFill, const bool reuseGrammar = false);

Deprecated/Removed Public API
 
  • The old Java-like DOM is now deprecated as described in DOM Reorganization
  • SAX2XMLReader::setValidationConstraint. For consistency, SAX2XMLReader users should set the feature "http://apache.org/xml/features/validation-error-as-fatal" instead.
  • SAX2XMLReader::setExitOnFirstFatalError. For consistency, SAX2XMLReader users should set the feature "http://apache.org/xml/features/continue-after-fatal-error" instead.
  • With the new Grammar Caching introduced, the following features will not be recognized by the SAX2XMLReader:
    • http://apache.org/xml/features/validation/reuse-grammar
    • http://apache.org/xml/features/validation/reuse-validator



Migrating from Xerces-C++ 1.6.0 to 1.7.0
 

The following section is a discussion of the technical differences between Xerces-C++ 1.6.0 code base and the Xerces-C++ 1.7.0 code base.

New features in Xerces-C++ 1.7.0
 
  • Support SAX2-ext's DeclHandler.
  • Directory sane_include reorganization: add sub-directory 'xercesc' to src / include folder. See "Directory change in Xerces-C++ 1.7.0" below for detail.
  • More IDOM test cases - port IDOMMemTest, and merge ThreadTest and IThreadTest.
  • Support IconvFBSD in multi-threading environment.
  • Use IDOM in schema processing for faster performance.
  • Add Project files for BCB6.
  • Port to Caldera (SCO) OpenServer.
  • Support building with new MacOSURLAccessCF NetAccessor that doesn't require Carbon but can allow Xerces to live solely within CoreServices layer.

Directory change in Xerces-C++ 1.7.0
 
  • A new directory, src/xercesc is created to be the new parent directory of all src's direct subdirectories.
  • And in the binary package, all the headers are distributed in include/xercesc directory.
  • Migration considerations:
    • Windows application,
      either change the include directories setting to "..\..\..\..\..\src\xercesc" (Projects->settings->C/C++->Preprocessor),
      or
      change the relevant #include instances in the source/header files, accordingly, eg
      #include <util/XMLString.hpp> be changed to
      #include <xercesc/util/XMLString.hpp>
    • Unix application,
      either change the include search path in the Makefile to " -I <installroot>/include/xercesc",
      or
      change the relevant #include instances in the source/header files as shown above.

Public API Changes in Xerces-C++ 1.7.0
 

The following lists the public API changes between the Xerces-C++ 1.7.0 and the Xerces-C++ 1.7.0 releases of the parser.

New Public API
 
  • Added SAX2-ext's DeclHandler class. See the API documentation page for details.
  • To support SAX2-ext's DeclHandler, the following new methods are added in classes DefaultHandler and SAX2XMLReader:
    • void DefaultHandler::elementDecl(const XMLCh* const name, const XMLCh* const model)
    • void DefaultHandler::attributeDecl(const XMLCh* const eName, const XMLCh* const aName, const XMLCh* const type, const XMLCh* const mode, const XMLCh* const value)
    • void DefaultHandler::internalEntityDecl(const XMLCh* const name, const XMLCh* const value)
    • void DefaultHandler::externalEntityDecl(const XMLCh* const name, const XMLCh* const publicId, const XMLCh* const systemId)
    • DeclHandler* SAX2XMLReader::getDeclarationHandler() const
    • void SAX2XMLReader::setDeclarationHandler(DeclHandler* const handler)
  • To conform to DOM Level 2 specification, the following methods are added:
    • DOM_Node DOM_NodeIterator::getRoot()
    • DOM_Node DOM_TreeWalker::getRoot()
    • bool DOM_Node::hasAttributes() const
    • bool DOM_Element::hasAttribute(const DOMString &name) const
    • bool DOM_Element::hasAttributeNS(const DOMString &namespaceURI, const DOMString &localName) const
    • IDOM_Node* IDOM_NodeIterator::getRoot()
    • IDOM_Node* IDOM_TreeWalker::getRoot()
    • bool IDOM_Node::hasAttributes() const
    • bool IDOM_Element::hasAttribute(const XMLCh* name) const
    • bool IDOM_Element::hasAttributeNS(const XMLCh* namespaceURI, const XMLCh* localName) const
  • To fix [Bug 5570], a copy constructor is added to DOM_Range

Modified Public API
 
  • To conform to the SAX2 specification, the namespace-prefixes feature in SAX2 is set to off as default.
  • To fix [Bug 6330], the Base64::encode and Base64::decode have been modified as follows
    • static XMLByte* Base64::encode(const XMLByte* const inputData, const unsigned int inputLength, unsigned int* outputLength);
    • static XMLByte* Base64::decode(const XMLByte* const inputData, unsigned int* outputLength);
    • static XMLCh* decode(const XMLCh* const inputData, unsigned int* outputLength);
  • To conform to DOM Level 2 specification, the DOM_Node::supports and IDOM_Node::supports are modified to
    • bool DOM_Node::isSupported(const DOMString &feature, const DOMString &version) const
    • bool IDOM_Node::isSupported(const XMLCh* feature, const XMLCh* version) const

Deprecated Public API
 
  • No Deprecated Public API in this release.



Migrating from Xerces-C++ 1.5.2 to 1.6.0
 

The following section is a discussion of the technical differences between Xerces-C++ 1.5.2 code base and the Xerces-C++ 1.6.0 code base.

New features in Xerces-C++ 1.6.0
 
  • Full Schema support is available in this release. See the Schema page for details.
  • New sample SEnumVal to show how to enumerate the markup decls in a Schema Grammar is added.

Public API Changes in Xerces-C++ 1.6.0
 

The following lists the public API changes between the Xerces-C++ 1.5.2 and the Xerces-C++ 1.6.0 releases of the parser.

New Public API
 
  • It should not be a fatal error if a schema InputSource is not found. Add the following new methods:
    • const bool InputSource::getIssueFatalErrorIfNotFound() const
    • void InputSource::setIssueFatalErrorIfNotFound(const bool flag
  • Allow code to take advantage of the fact that the length of the prefix and local name are known when constructing the QName. Add the following new methods:
    • void QName::setNPrefix(const XMLCh*, const unsigned int)
    • void QName::setNLocalPart(const XMLCh*, const unsigned int)
  • To support schemaLocation and noNamespaceSchemaLocation to be specified outside the instance document, the following new methods are added:
    • XMLCh* DOMParser::getExternalSchemaLocation() const
    • XMLCh* DOMParser::getExternalNoNamespaceSchemaLocation() const
    • void DOMParser::setExternalSchemaLocation(const XMLCh* const schemaLocation)
    • void DOMParser::setExternalNoNamespaceSchemaLocation(const char* const noNamespaceSchemaLocation)
    • XMLCh* IDOMParser::getExternalSchemaLocation() const
    • XMLCh* IDOMParser::getExternalNoNamespaceSchemaLocation() const
    • void IDOMParser::setExternalSchemaLocation(const XMLCh* const schemaLocation)
    • void IDOMParser::setExternalNoNamespaceSchemaLocation(const char* const noNamespaceSchemaLocation)
    • XMLCh* SAXParser::getExternalSchemaLocation() const
    • XMLCh* SAXParser::getExternalNoNamespaceSchemaLocation() const
    • void SAXParser::setExternalSchemaLocation(const XMLCh* const schemaLocation)
    • void SAXParser::setExternalNoNamespaceSchemaLocation(const char* const noNamespaceSchemaLocation)
    • and the following properties are recognized by SAX2XMLReader:
      • http://apache.org/xml/properties/schema/external-schemaLocation
      • http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation
  • To support identity constraints, the following new method is added:
    • QName* XMLAttr::getAttName() const

Modified Public API
 
  • To support attribute constraint checking, the constant values in XMLAttDef::DefAttTypes have been re-ordered.

Deprecated Public API
 
  • Root Element check is moved from XMLValidator to XMLScanner. Thus XMLValidator::checkRootElement() is deprecated.



Migrating from Xerces-C++ 1.4.0 to 1.5.2
 

The following section is a discussion of the technical differences between Xerces-C++ 1.4.0 code base and the Xerces-C++ 1.5.2 code base.

New features in Xerces-C++ 1.5.2
 

Schema subset support and an experimental IDOM are available in this release.

Schema Subset Support
 
  • New function "setDoSchema" is added to DOM/SAX parser.
  • New feature "http://apache.org/xml/features/validation/schema" is recognized by SAX2XMLReader.
  • New classes such as SchemaValidator, TraverseSchema ... are added.
  • The Scanner is enhanced to process schema.
  • New sample data files personal-schema.xml and personal.xsd.
  • New command line option "-s" for samples.

See the Schema page for details.


Experimental IDOM
 

The experimental IDOM API is a new design of the C++ DOM API. Please note that this experimental IDOM API is only a prototype and is subject to change.



Changes required to migrate to Xerces-C++ 1.5.2
 

There are some architectural changes between the Xerces-C++ 1.4.0 and the Xerces-C++ 1.5.2 releases of the parser, and as a result, some code has undergone restructuring as shown below.

Validator directory Reorganization
 
  • common content model files such as DFAContentModel ... are moved to a new directory called src/validators/common
  • DTD related files are moved to a new directory called src/validators/DTD
  • new directory src/validators/Datatype is created to store all datatype validators
  • new directory src/validators/schema is created to store Schema related files

DTDValidator
 

DTDValidator was design to scan, validate and store the DTD in Xerces-C++ 1.4.0 or earlier. In Xerces-C++ 1.5.2, this process is broken down into three components:

  • new class DTDScanner - to scan the DTD
  • new class DTDGrammar - to store the DTD Grammar
  • DTDValidator - to validate the DTD only



Migrating from XML4C 2.x to Xerces-C++ 1.4.0
 

The following section is a discussion of the technical differences between XML4C 2.x code base and the new Xerces-C++ 1.4.0 code base.

Summary of changes required to migrate from XML4C 2.x to Xerces-C++ 1.4.0
 

There are some major architectural changes between the 2.3.x and Xerces-C++ 1.4.0 releases of the parser, and as a result the code has undergone significant restructuring. The list below mentions the public api's which existed in 2.3.x and no longer exist in Xerces-C++ 1.4.0. It also mentions the Xerces-C++ 1.4.0 api which will give you the same functionality. Note: This list is not exhaustive. The API docs (and ultimately the header files) supplement this information.

  • parsers/[Non]Validating[DOM/SAX]parser.hpp
    These files/classes have all been consolidated in the new version to just two files/classes: [DOM/SAX]Parser.hpp. Validation is now a property which may be set before invoking the parse. Now, the setDoValidation() method controls the validation processing.
  • The framework/XMLDocumentTypeHandler.hpp been replaced with validators/DTD/DocTypeHandler.hpp.
  • The following methods now have different set of parameters because the underlying base class methods have changed in the 3.x release. These methods belong to one of XMLDocumentHandler, XMLErrorReporter or DocTypeHandler interfaces.
    • [Non]Validating[DOM/SAX]Parser::docComment
    • [Non]Validating[DOM/SAX]Parser::doctypePI
    • [Non]ValidatingSAXParser::elementDecl
    • [Non]ValidatingSAXParser::endAttList
    • [Non]ValidatingSAXParser::entityDecl
    • [Non]ValidatingSAXParser::notationDecl
    • [Non]ValidatingSAXParser::startAttList
    • [Non]ValidatingSAXParser::TextDecl
    • [Non]ValidatingSAXParser::docComment
    • [Non]ValidatingSAXParser::docPI
    • [Non]Validating[DOM/SAX]Parser::endElement
    • [Non]Validating[DOM/SAX]Parser::startElement
    • [Non]Validating[DOM/SAX]Parser::XMLDecl
    • [Non]Validating[DOM/SAX]Parser::error
  • The following methods/data members changed visibility from protected in 2.3.x to private (with public setters and getters, as appropriate).
    • [Non]ValidatingDOMParser::fDocument
    • [Non]ValidatingDOMParser::fCurrentParent
    • [Non]ValidatingDOMParser::fCurrentNode
    • [Non]ValidatingDOMParser::fNodeStack
  • The following files have moved, possibly requiring changes in the #include statements.
    • MemBufInputSource.hpp
    • StdInInputSource.hpp
    • URLInputSource.hpp
  • All the DTD validator code was moved from internal to separate validators/DTD directory.
  • The error code definitions which were earlier in internal/ErrorCodes.hpp are now split up into the following files:
    • framework/XMLErrorCodes.hpp - Core XML errors
    • framework/XMLValidityCodes.hpp - DTD validity errors
    • util/XMLExceptMsgs.hpp - C++ specific exception codes.

The Samples
 

The sample programs no longer use any of the unsupported util/xxx classes. They only existed to allow us to write portable samples. But, since we feel that the wide character APIs are supported on a lot of platforms these days, it was decided to go ahead and just write the samples in terms of these. If your system does not support these APIs, you will not be able to build and run the samples. On some platforms, these APIs might perhaps be optional packages or require runtime updates or some such action.

More samples have been added as well. These highlight some of the new functionality introduced in the new code base. And the existing ones have been cleaned up as well.

The new samples are:

  1. PParse - Demonstrates 'progressive parse' (see below)
  2. StdInParse - Demonstrates use of the standard in input source
  3. EnumVal - Shows how to enumerate the markup decls in a DTD Validator

Parser Classes
 

In the XML4C 2.x code base, there were the following parser classes (in the src/parsers/ source directory): NonValidatingSAXParser, ValidatingSAXParser, NonValidatingDOMParser, ValidatingDOMParser. The non-validating ones were the base classes and the validating ones just derived from them and turned on the validation. This was deemed a little bit overblown, considering the tiny amount of code required to turn on validation and the fact that it makes people use a pointer to the parser in most cases (if they needed to support either validating or non-validating versions.)

The new code base just has SAXParer and DOMParser classes. These are capable of handling both validating and non-validating modes, according to the state of a flag that you can set on them. For instance, here is a code snippet that shows this in action.

void ParseThis(const  XMLCh* const fileToParse,
               const bool validate)
{
  //
  // Create a SAXParser. It can now just be
  // created by value on the stack if we want
  // to parse something within this scope.
  //
  SAXParser myParser;

  // Tell it whether to validate or not
  myParser.setDoValidation(validate);

  // Parse and catch exceptions...
  try
  {
    myParser.parse(fileToParse);
  }
    ...
};

We feel that this is a simpler architecture, and that it makes things easier for you. In the above example, for instance, the parser will be cleaned up for you automatically upon exit since you don't have to allocate it anymore.


Moved Classes to src/framework
 

Some of the classes previously in the src/internal/ directory have been moved to their more correct location in the src/framework/ directory. These are classes used by the outside world and should have been framework classes to begin with. Also, to avoid name classes in the absence of C++ namespace support, some of these clashes have been renamed to make them more XML specific and less likely to clash. More classes might end up being moved to framework as well.

So you might have to change a few include statements to find these classes in their new locations. And you might have to rename some of the names of the classes, if you used any of the ones whose names were changed.


Util directory Reorganization
 

The src/util directory was becoming somewhat of a dumping ground of platform and compiler stuff. So we reworked that directory to better spread things out. The new scheme is:

util - The platform independent utility stuff
 
  • MsgLoaders - Holds the msg loader implementations
    1. ICU
    2. InMemory
    3. MsgCatalog
    4. Win32
  • Compilers - All the compiler specific files
  • Transcoders - Holds the transcoder implementations
    1. Iconv
    2. ICU
    3. Win32
  • Platforms
    1. AIX
    2. HP-UX
    3. Linux
    4. Solaris
    5. ....
    6. Win32

This organization makes things much easier to understand. And it makes it easier to find which files you need and which are optional. Note that only per-platform files have any hard coded references to specific message loaders or transcoders. So if you don't include the ICU implementations of these services, you don't need to link in ICU or use any ICU headers. The rest of the system works only in terms of the abstraction APIs.




Copyright © 1999-2017 The Apache Software Foundation. All Rights Reserved.