|
| | | | How do I create a DOM parser? | | | | |
| |
| | | | import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import java.io.IOException;
...
String xmlFile = "file:///xerces-1_4_4/data/personal.xml";
DOMParser parser = new DOMParser();
try {
parser.parse(xmlFile);
} catch (SAXException se) {
se.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
}
Document document = parser.getDocument(); | | | | |
|
| | | | How do I create a SAX parser? | | | | |
| |
| | | | import org.apache.xerces.parsers.SAXParser;
import org.xml.sax.Parser;
import org.xml.sax.ParserFactory;
import org.xml.sax.SAXException;
import java.io.IOException;
...
String xmlFile = "file:///xerces-1_4_4/data/personal.xml";
String parserClass = "org.apache.xerces.parsers.SAXParser";
Parser parser = ParserFactory.makeParser(parserClass);
try {
parser.parse(xmlFile);
} catch (SAXException se) {
se.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} | | | | |
|
| | When you create a parser instance, the default error handler does nothing.
This means that your program will fail silently when it encounters an error.
You should register an error handler with the parser by supplying a class
which implements the org.xml.sax.ErrorHandler
interface. This is true regardless of whether your parser is a
DOM based or SAX based parser.
|
| | | | How do I access the DOM Level 2 functionality? | | | | |
| | The DOM Level 2
specification is at the stage of
"Candidate Recommendation" (CR), which allows feedback from implementors
before it becomes a "Recommedation". It is comprised of "core"
functionality, which is mainly the DOM
Namespaces implementation,
and a number of optional modules (called Chapters in the spec).
Please refer to:
http://www.w3.org/TR/DOM-Level-2/ for the
latest DOM Level 2 specification.
The following DOM Level 2 modules are fully implemented in Xerces-J:
-
Chapter 1: Core - most of these enhancements are for
Namespaces, and can be acessed through additional functions which
have been added directly to the org.w3c.dom.* classes.
-
Chapter 6: Events - The org.w3c.dom.events.EventTarget
interface is implemented by all
Nodes of the DOM.
The Xerces-J DOM implementation handles all of the event
triggering, capture and flow.
-
Chapter 7: Traversal - The Traversal module interfaces
are located in org.w3c.dom.traversal.
The
NodeIterator and TreeWalker , and
NodeFilter interfaces have been supplied to allow
traversal of the DOM at a higher-level. Our DOM Document
implementation class, DocumentImpl class now
implements DocumentTraversal , which supplies the
factory methods to create the iterators and treewalkers.
-
Chapter 8. Range - The Range module interfaces are
located in org.w3c.dom.range. The Range interface
allows you to specify ranges or selections using boundary
points in the DOM, along with functions (like delete,
clone, extract..) that can be performed on these ranges.
Our DOM Document implementation class,
DocumentImpl
class now implements DocumentRange , that supplies
the factory method to create a Range .
| Since the DOM Level 2 is still in the CR phase, some changes
to these specs are still possible. The purpose of this phase is to
provide feedback to the W3C, so that the specs can be clarified and
implementation concerns can be addressed. |
|
| | | | How do I read data from a stream as it arrives? | | | | |
| | There are 3 problems you have to deal with:
- The Apache parsers read the entire data stream into a buffer before they start
parsing; you need to change this behaviour, so that they analyse "on the fly"
- The Apache parsers terminate when they reach end-of-file; with a data stream,
unless the sender drops the socket, you have no end-of-file, so you need to
terminate in some other way
- The Apache parsers close the input stream on termination, and this closes the
socket; you normally don't want this, because you'll want to send an ack to the
data stream source, and you may want to have further exchanges on the socket
anyway.
Preventing the buffering
To do this, create a subclass of org.apache.xerces.readers.DefaultReaderFactory
and override createCharReader and createUTF8Reader as shown below.
| | | |
package org.apache.xerces.readers;
import org.apache.xerces.framework.XMLErrorReporter;
import org.apache.xerces.utils.ChunkyByteArray;
import org.apache.xerces.utils.StringPool;
import org.xml.sax.InputSource;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
import java.util.Stack;
public class StreamingCharFactory extends org.apache.xerces.readers.DefaultReaderFactory {
public XMLEntityHandler.EntityReader createCharReader(XMLEntityHandler entityHandler,
XMLErrorReporter errorReporter,boolean sendCharDataAsCharArray,
Reader reader,
StringPool stringPool)
throws Exception
{
return new org.apache.xerces.readers.StreamingCharReader(entityHandler,
errorReporter,
sendCharDataAsCharArray,
reader,
stringPool);
}
public XMLEntityHandler.EntityReader createUTF8Reader(XMLEntityHandler entityHandler,
XMLErrorReporter errorReporter,
boolean sendCharDataAsCharArray,
InputStream data,StringPool stringPool)
throws Exception
{
XMLEntityHandler.EntityReader reader;
reader = new org.apache.xerces.readers.StreamingCharReader(entityHandler,
errorReporter,
sendCharDataAsCharArray,
new InputStreamReader(data, "UTF8"),
stringPool);
return reader;
}
}
| | | | |
In your program, after you instantiate a parser class, replace the
DefaultReaderFactory with StreamingCharFactory. You'll need to instantiate your
parser as a SAXParser, rather than simply as an XMLReader, because the XMLReader
interface doesn't have the setReaderFactory method. Be sure to wrap the
InputStream that you are reading from with an InputStreamReader.
| | | |
try {
SAXParser parser =
(SAXParser)Class.forName("org.apache.xerces.parsers.SAXParser").newInstance();
parser.setContentHandler(new DocProcessor(out));
parser.setReaderFactory(new StreamingCharFactory());
parser.parse(new InputSource(bufferedReader));
} catch (Exception ex) {
}
| | | | |
Terminating the parse
One way that works forSAX is to throw an exception when you detect the logical
end-of-document.
For instance, in your class extending DefaultHandler, you can have:
| | | |
public class DocProcessor extends DefaultHandler {
private int level;
.
.
public void startElement(String uri,
String localName,
String raw,
Attributes attrs) throws SAXException
{
++level;
}
public void endElement (String namespaceURI,
String localName,
String qName) throws SAXException
{
if ((--level) == 0) {
throw new SAXException ("Finished");
}
}
| | | | |
Preventing the parser from closing the socket
One way is to subclass BufferedReader to provide an empty close method.
So, invoke the parser as follows:
| | | |
Socket socket;
// code to set the socket
parser.parse(new InputSource(new MyBufferedReader(new InputStreamReader(socket.getInputStream()))));
.
.
class MyBufferedReader extends BufferedReader
{
public MyBufferedReader(InputStreamReader i) {
super(i);
}
public void close() {
}
}
| | | | |
|
|