General FAQs


	Questions

Validating Schemas
Bugzilla
Extracting code from CVS
Revalidation of DOM document in Memory
Schema/DTD caching
New Features?
Validation
International Encodings


	Answers


	I have written a schema and I want to use Xerces to validate it. How do I do this?

The best way to solve this problem is to write a simple, valid instance document and use one of the sample programs that accompanies Xerces (such as sax.SAXCount or dom.DOMCount) to validate the instance document. While validating the instance document, Xerces will simultaneously validate the corresponding schema. We hope to introduce functionality to permit schemas to be validated independently of instance documents in Xerces2.


	How do I use Bugzilla to report bugs?

Please report bugs against the newest release.
If doubt exists whether the behaviour in question is a bug or a feature, please post a message to the xerces-j-user list for clarification.
To help to eliminate duplicate bug reports, before reporting a bug, you should first query the bugzilla database to see whether the bug has already been reported (and perhaps fixed). Then, checkout code from CVS, and build Xerces-J locally to verify that a bug still exists.

For more information visit the following links:


	How do I extract code from CVS?

set CVSROOT=:pserver:anoncvs@cvs.apache.org:/home/cvspublic
cvs login (password: anoncvs)
cvs checkout -d xerces_j xml-xerces/java


	I have used the DOMParser to convert an XML document into a DOM tree. Then I made some changes to the DOM tree. How do I make sure the document still conforms to my (schema or DTD)?

DOM revalidation is not supported by Xerces 1. Ken Rawlings has been trying to build a revalidating DOMParser based on code which was dropped from Xerces because it was no longer being maintained. The current code is at: http://www.vervet.com/~krawling/RevalidatingDOMParser.java. We hope that Xerces 2 will include this capability.


	I have a (DTD or schema) that I will use to validate many XML documents. How can I avoid having to recompile it every time I want to validate a new document?

Xerces 1 does not currently support grammar caching. We expect that Xerces 2 will support this functionality. Some users have reported success by registering an EntityResolver which handles reading the grammar from the disk and storing it in an efficient way (usually a byteArray), then passing the parser a reader for the efficiently-stored structure. While this does not avoid recompiling the grammar for each instance document, it does avoid disk accesses.


	What are the new features?

Here are some of the new features in Xerces-J:

Additional support for W3C XML Schema Language.
DOS filenames no longer work. See Common Problems section of the FAQ.


	How do I turn on validation?

You can turn validation on and off via methods available on the SAX2 XMLFilter interface. While only the SAXParser implements the XMLFilter interface, the methods required for turning on validation are available to both parser classes, DOM and SAX.
The code snippet below shows how to turn validation on -- assume that parser is an instance of either org.apache.xerces.parsers.SAXParser or org.apache.xerces.parsers.DOMParser.

parser.setFeature("http://xml.org/sax/features/validation", true);

IMPORTANT!Simply turning on validation will not make Xerces actually report the errors that it detects. For this, you need to implement the org.xml.sax.ErrorHandler interface and register your implementation with the parser using the setErrorHandler method.


	What international encodings are supported by Xerces-J?

In general, the parser supports all IANA encodings and aliases (see http://www.iana.org/assignments/character-sets) that have clear mappings to Java encodings (see here for details). Some of the more common encodings are:

UTF-8
UTF-16 Big Endian, UTF-16 Little Endian
IBM-1208
ISO Latin-1 (ISO-8859-1)
ISO Latin-2 (ISO-8859-2) [Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian (in Latin transcription), Serbocroatian, Slovak, Slovenian, Upper and Lower Sorbian]
ISO Latin-3 (ISO-8859-3) [Maltese, Esperanto]
ISO Latin-4 (ISO-8859-4)
ISO Latin Cyrillic (ISO-8859-5)
ISO Latin Arabic (ISO-8859-6)
ISO Latin Greek (ISO-8859-7)
ISO Latin Hebrew (ISO-8859-8)
ISO Latin-5 (ISO-8859-9) [Turkish]
Extended Unix Code, packed for Japanese (euc-jp, eucjis)
Japanese Shift JIS (shift-jis)
Chinese (big5)
Chinese for PRC (mixed 1/2 byte) (gb2312)
Japanese ISO-2022-JP (iso-2022-jp)
Cyrllic (koi8-r)
Extended Unix Code, packed for Korean (euc-kr)
Russian Unix, Cyrillic (koi8-r)
Windows Thai (cp874)
Latin 1 Windows (cp1252)
cp858
EBCDIC encodings:

EBCDIC US (ebcdic-cp-us)
EBCDIC Canada (ebcdic-cp-ca)
EBCDIC Netherland (ebcdic-cp-nl)
EBCDIC Denmark (ebcdic-cp-dk)
EBCDIC Norway (ebcdic-cp-no)
EBCDIC Finland (ebcdic-cp-fi)
EBCDIC Sweden (ebcdic-cp-se)
EBCDIC Italy (ebcdic-cp-it)
EBCDIC Spain, Latin America (ebcdic-cp-es)
EBCDIC Great Britain (ebcdic-cp-gb)
EBCDIC France (ebcdic-cp-fr)
EBCDIC Hebrew (ebcdic-cp-he)
EBCDIC Switzerland (ebcdic-cp-ch)
EBCDIC Roece (ebcdic-cp-roece)
EBCDIC Yugoslavia (ebcdic-cp-yu)
EBCDIC Iceland (ebcdic-cp-is)
EBCDIC Urdu (ebcdic-cp-ar2)
Latin 0 EBCDIC
EBCDIC Arabic (ebcdic-cp-ar1)

Please also look at the documentation for the feature "http://apache.org/xml/features/allow-java-encodings" which provides a mechanism for using the encoding names recognized directly by Java.