http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Home

Readme
Release Info

Installation
Download
Build Instructions

FAQs
Samples
API Docs

DOM C++ Binding
Programming
Migration Guide

Feedback
Bug-Reporting
PDF Document

CVS Repository
Mail Archive

Migration Archive
 

For migration information to Xerces-C++ 1.7.0 or earlier, please refer to Migration Archive.


Migrating from Xerces-C++ 1.7.0 to Xerces-C++ 2.0.0
 

This document is a discussion of the technical differences between Xerces-C++ 1.7.0 code base and the Xerces-C++ 2.0.0.

Topics discussed are:

New features in Xerces-C++ 2.0.0
 
  • 64 bit binaries distribution
  • Follow Unix Shared Library Naming Convention
  • Apache Recommended DOM C++ Binding
  • Experimental DOM Level 3 subset support, including DOMWriter and DOMBuilder
  • Grammar preparsing and Grammar caching
  • Optionally ignore loading of external DTD
  • Project files for Microsoft Visual C++ .Net
  • Codewarrior 8 support
  • Option to enable/disable strict IANA encoding name checking
  • plus many more bug fixes and performance enhancement

Unix Library Name Change
 

The Xerces-C++ UNIX Library now follows the Unix Shared Library Naming Convention (libname.so.soname). It is now called:

  • AIX
    • libxerces-c20.0.so
    • symbolic link: libxerces-c.so ----> libxerces-c20.so
    • symbolic link: libxerces-c20.so ----> libxerces-c20.0.so
  • Solaris / Linux
    • libxerces-c.so.20.0
    • symbolic link: libxerces-c.so ----> libxerces-c.so.20
    • symbolic link: libxerces-c.so.20 ----> libxerces-c.so.20.0
  • HP-UX
    • libxerces-c.sl.20.0
    • symbolic link: libxerces-c.sl ----> libxerces-c.sl.20
    • symbolic link: libxerces-c.sl.20 ----> libxerces-c.sl.20.0

DOM Reorganization
 

1. The old Java-like DOM is now deprecated, and all the associated files, including the headers and DOMParser files are moved to src/xercesc/dom/deprecated. Users of the old Java-like DOM are required to change all their #include lines to pick up the headers. For example

//old code
#include <xercesc/dom/DOM.hpp>
#include <xercesc/dom/DOM_Document.hpp>
#include <xercesc/parsers/DOMParser.hpp>

void test(char* xmlFile) {
    DOMParser parser;
    parser.parse(xmlFile);
    DOM_Document doc = parser.getDocument();
    :
    return;
}

should now change to

//new code
#include <xercesc/dom/deprecated/DOM.hpp>          //<==== change this include line
#include <xercesc/dom/deprecated/DOM_Document.hpp> //<==== change this include line
#include <xercesc/dom/deprecated/DOMParser.hpp>    //<==== change this include line

// the rest is the same
void test(char* xmlFile) {
    DOMParser parser;
    parser.parse(xmlFile);
    DOM_Document doc = parser.getDocument();
    :
    return;
}

2. The Experimental IDOM is now renamed, and becomes the Apache Recommended DOM C++ Binding. The following changes are made:

  • class names are renamed from IDOM_XXXX to DOMXXXX, e.g. IDOM_Document to DOMDocument
  • and thus header files are renamed from IDOM_XXXX.hpp to DOMXXXX.hpp and are moved to src/xercesc/dom
  • the IDOMParser is renamed to XercesDOMParser. And thus the header file is renamed as well
  • the rest is the same, see Apache Recommended DOM C++ binding and DOM Programming Guide for more programming information

Users of IDOM are required to change all their #include lines and do a global rename of IDOMParser to XercesDOMParesr, and IDOM_XXXX to DOMXXXX. For example

//old code
#include <xercesc/idom/IDOM.hpp>
#include <xercesc/idom/IDOM_Document.hpp>
#include <xercesc/parsers/IDOMParser.hpp>

void test(char* xmlFile) {
    IDOMParser parser;
    parser.parse(xmlFile);
    IDOM_Document* doc = parser.getDocument();
    :
    return;
}

should now change to

//new code
#include <xercesc/dom/DOM.hpp>                  //<==== change this include line
#include <xercesc/dom/DOMDocument.hpp>          //<==== change this include line
#include <xercesc/parsers/XercesDOMParser.hpp>  //<==== change this include line

void test(char* xmlFile) {
    XercesDOMParser parser;                           //<==== rename the IDOMParser
    parser.parse(xmlFile);
    DOMDocument* doc = parser.getDocument();          //<==== rename the IDOM_XXXX
    :
    return;
}

Reuse Grammar becomes Grammar Caching
 

The Xerces-C++ 2.0.0 extends the "Reuse Grammar" support by replacing it with a new feature called "Grammar Caching" which provides more flexibility in reusing grammars. Users who used to do the following:


      XercesDOMParser parser;

      // this is the first parse, just usual code as you do normal parse
      // "firstXmlFile" has a grammar (schema or DTD) specified.
      parser.parse(firstXmlFile);

      // this is the second parse, by setting second parameter to true,
      // the parser will reuse the grammar in the last parse
      // (i.e. the one in  "firstXmlFile")
      // to validate the second "anotherXmlFile".  Any grammar that is
      // specified in anotherXmlFile is IGNORED.
      //
      // Note: The anotherXmlFile cannot have any DTD internal subset.
      parser.parse(anotherXmlFile, true);

should now use the features cacheGrammarFromParse and useCachedGrammarFromParse:

      XercesDOMParser parser;

      // By setting cacheGrammarFromParse to true,
      // the parser will cache any grammars encountered in the
      // follow-on xml files, if not cached already
      parser.cacheGrammarFromParse(true);

      parser.parse(firstXmlFile);

      // By setting useCachedGrammarFromParse to true,
      // the parser will use all the previous cached grammars
      // to validate the follow-on xml files if the cached
      // grammar matches the one specified in anotherXmlFile.
      //
      // Note: The follow-on xml files cannot have any DTD internal subset.
      parser.useCachedGrammarFromParse(true);

      parser.parse(anotherXmlFile);

      // This will flush the cached grammar pool
      parser.resetCachedGrammarPool();

Note there are a number of differences between "Reuse Grammar" and "Grammar Caching"

  1. "Reuse Grammar" ignores any grammar that is specified in anotherXmlFile and simply reuse whatever stored in previous parse; while "Grammar Caching" will use the cached grammar only if it matches the one specified in the anotherXmlFile. If not match, then the new grammar is parsed.
  2. "Reuse Grammar" can only reuse the grammar from previous parse; while "Grammar Caching" can selectively cache many grammars from different parses and collect them all in a pool indexed by targetNamespace (for Schema) or system id (for DTD).
  3. Plus "Grammar Caching" has much more functionalities other than above (like "Pre-parsing Grammar"). Please refer to Preparsing Grammar and Grammar Caching for more programming details.

Public API Changes
 

The following lists the public API changes between the Xerces-C++ 1.7.0; and the Xerces-C++ 2.0.0 releases of the parser.

New Public API
 
  • To support DOM Level 3, the following are added (see the API documentation page for details).
    • DOMNode functions set/getUserData, isSameNode isEqualNode.
    • DOMDocument functions renameNode, get/setActualEncoding, get/setEncoding, get/setVersion, get/setStandalone, get/setDocumentURI.
    • DOMEntity functions get/setActualEncoding, get/setEncoding, get/setVersion.
    • classes AbstractDOMParser, DOMError, DOMErrorHandler, and DOMLocator.
    • classes DOMUserDataHandler, DOMImplementationRegistry and DOMImplementationSource.
    • classes DOMBuilder, DOMEntityResolver, DOMImplementationLS, DOMInputSource, Wrapper4DOMInputSource and Wrapper4InputSource.
    • classes DOMWriter, DOMWriterFilter, LocalFileFormatTarget, StdOutFormatTarget, and MemBufFormatTarget
  • To support DOMWriter, the following PlatformUtils functions are added
    • openFileToWrite, writeBufferToFile
  • To have Apache Recommended DOM C++ Binding, the following are added (see Apache Recommended DOM C++ binding).
    • function release() to fix Memory Management problem
    • classes DOMDocumentRange and DOMDocumentTraversal
    • XMLSize_t is used to represent unsigned integral type in DOM
    • IDOM_XXXX classes are renamed to DOMXXXX, and IDOMParser is renamed to XercesDOMParser as described in DOM Reorganization
    • XercesDOMParser::adoptDocument is added so that document can optionally live outside the parser.
  • To support optionally load external DTD, the following are added:
    • XercesDOMParser::set/getLoadExternalDTD
    • DOMParser::set/getLoadExternalDTD
    • SAXParser::set/getLoadExternalDTD
    • and SAX2XMLReader will recognize the feature http://apache.org/xml/features/nonvalidating/load-external-dtd
  • To support Preparsing Grammar and Grammar Caching, the following are added:
    • XercesDOMParser/DOMParser/SAXParser functions loadGrammar, resetCachedGrammarPool, cacheGrammarFromParse, isCachingGrammarFromParse, useCachedGrammarInParse, isUsingCachedGrammarInParse.
    • SAX2XMLReader functions loadGrammar, resetCachedGrammarPool, and will recognize the features http://apache.org/xml/features/validation/cache-grammarFromParse and http://apache.org/xml/features/validation/use-cachedGrammarInParse.
  • To support access to Grammar info, the following are added:
    • XercesDOMParser/DOMParser/SAXParser/SAX2XMLReader functions getRootGrammar, getGrammar, getURIText.
  • To support strict IANA encoding name checking, the following are added:
    • class EncodingValidator.
    • PlatformUtils functions strictIANAEncoding, isStrictIANAEncoding.
    • XMLTransService functions strictIANAEncoding, isStrictIANAEncoding.

Modified Public API
 
  • SAXParser::getScanner() is moved from public to protected.
  • Grammar::getGrammarType has been added a const modifier.
  • Xerces features are renamed from XMLUni::fgSAX2XercesXXXX to XMLUni::fgXercesXXXX so that they can be shared with DOM parser.
  • With the new Grammar Caching introduced, the the last parameter "reuseGrammar" in the following API is dropped. Users should now use the "Grammar Caching" feature as described in Reuse Grammar becomes Grammar Caching.
    • (in Parser, SAXParser, DOMParser, and XercesDOMParser)
    • parse(const InputSource& source, const bool reuseGrammar = false);
    • parse(const XMLCh* const systemId, const bool reuseGrammar = false);
    • parse(const char* const systemId, const bool reuseGrammar = false);
    • (in SAXParser, DOMParser, and XercesDOMParser)
    • parseFirst(const InputSource& source, XMLPScanToken& toFill, const bool reuseGrammar = false);
    • parseFirst(const XMLCh* const systemId, XMLPScanToken& toFill, const bool reuseGrammar = false);
    • parseFirst(const char* const systemId, XMLPScanToken& toFill, const bool reuseGrammar = false);

Deprecated/Removed Public API
 
  • The old Java-like DOM is now deprecated as described in DOM Reorganization
  • SAX2XMLReader::setValidationConstraint. For consistency, SAX2XMLReader users should set the feature http://apache.org/xml/features/validation-error-as-fatal" instead.
  • SAX2XMLReader::setExitOnFirstFatalError. For consistency, SAX2XMLReader users should set the feature "http://apache.org/xml/features/continue-after-fatal-error" instead.
  • With the new Grammar Caching introduced, the following features will not be recognized by the SAX2XMLReader:
    • http://apache.org/xml/features/validation/reuse-grammar
    • http://apache.org/xml/features/validation/reuse-validator




Copyright © 2001 The Apache Software Foundation. All Rights Reserved.