Java API for XML Processing
Release Notes
Version: 1.1ea
This document contains notes that may help you use this library
more effectively.
XSLT Support
Parser
- There are two factory classes for making parsers pluggable. If
you write to the JAXP API in the
javax.xml.parsers
,
org.xml.sax
, and
org.w3c.dom
packages,
you can use the library in a manner independent of the underlying
implementing parser.
- To be notified of validation errors in an XML document, two
things must happen.
- Validation must be turned on. See the
setValidating
methods of
javax.xml.parsers.DocumentBuilderFactory
or
javax.xml.parsers.SAXParserFactory
.
- An application-defined
ErrorHandler
must be
set. See the setErrorHandler
methods of
javax.xml.parsers.DocumentBuilder
or
org.xml.sax.XMLReader
.
The links provided above are only some of the ways to get
notification of validation errors.
- Whenever you work with text encodings other than UTF-8 and
UTF-16, you should put an encoding declaration at the very beginning of
all your XML files (including DTDs). If you don't do this, the
parser will not be able to determine the encoding being used, and
will probably be unable to parse your document. A text declaration
like
<?xml version='1.0' encoding='euc-jp'?>
says
that the document uses the "euc-jp" encoding.
- The parser currently reports warnings, rather than errors,
in cases where the declared and actual text encodings don't match.
It may give those same warnings in the common case where the encoding
name used internally to Java is not the one used in the document.
If the declared encoding is truly an error, you'll usually see other
errors (not warnings) being reported by the parser.
- The parser currently does not report an error for content
models which are not deterministic. Accordingly it may not behave
well when given data which matches an "ambiguous" content model
such as ((a,b)|(a,c)). DTDs with such models are in
error, and must be restructured to be unambiguous. (In the example,
(a,(b|c)) is an equivalent legal content model.)
- If you are using JDK 1.1 with large numbers of symbols
(more than can be counted in sixteen bits) you might encounter
a message, panic: 16-bit string hash table overflow
as the Java VM aborts. The Java 2 SDK does not have this limitation.
Object Model
- Conforming to the XML specification, the parser reports all
whitespace to the DOM even, if it's meaningless. Many applications
do not want to see such whitespace. You can remove it by invoking
the Element.normalize method, which merges adjacent text
nodes and also canonicalizes adjacent whitespace into a single space
(unless the xml:space="preserve" attribute prevents it).
- Currently, attribute nodes may not have children. Access their
values as strings instead of enumerating children.
- Currently, when documents are cloned, the clone will not have a
clone of the associated ElementFactory or DocumentType.
- The in-memory representation of text nodes has not been tuned
to be efficient with respect to space utilization.
Other Issues
- This software is a "Java Optional Package" for
XML processing.
- If you recompile the DOM implementation using versions of "javac"
older than the Java 2 SDK version 1.2 you may run into a compiler bug.
The symptom is a report of illegal access violations for some of the
private classes inside the DOM implementation. This is because of
incorrect code generated by the compiler. You should only compile
these class files with a compiler that does not have this bug; you may
also use the pre-compiled version in this release. There is no
bytecode dependency on the Java 2 runtime; you may use these classes on
JDK 1.1 systems also.
- The Microsoft SDK 3.2 for Java (and presumably all earlier
versions) has bugs similar to the one noted above. There are
both compiler and JVM bugs; the JVM bugs prevent the correct
byte codes (as produced by the Java 2 SDK) from working. This
means that you can't compile or use this DOM code with Microsoft
implementations of Java until Microsoft fixes these bugs, which
have been reported to Microsoft.
Changes since JAXP RI (Reference Implementation) version 1.0.1
- All previous releases (from version 1.0.1 and before) used a
parser implementation with a package heirarchy beginning with
com.sun.xml
. Between version 1.0.1 and the current
release, the parser was donated to the Apache Software Foundation
under the name "Crimson" and the packages were correspondingly
renamed to org.apache.crimson
. Migration from
previous releases may involve renaming packages in your
application. In addition, if your application uses SAX1 then you
may either convert it to use the preferred SAX2
org.sax.xml.XMLReader
or obtain a SAX1
org.sax.xml.Parser
from the
javax.xml.parsers.SAXParser.getParser()
method.