public class XPathRecordReader extends Object
A streaming xpath parser which uses StAX for XML parsing. It supports only a subset of xpath syntax.
/a/b/subject[@qualifier='fullTitle'] /a/b/subject[@qualifier=]/subtag /a/b/subject/@qualifier //a //a/b... /a//b /a//b... /a/b/cA record is a Map<String,Object> . The key is the provided name and the value is a String or a List<String> This class is thread-safe for parsing xml. But adding fields is not thread-safe. The recommended usage is to addField() in one thread and then share the instance across threads.
This API is experimental and may change in the future.
Modifier and Type | Class and Description |
---|---|
static interface |
XPathRecordReader.Handler
Implement this interface to stream records as and when one is found.
|
Modifier and Type | Field and Description |
---|---|
static int |
FLATTEN
The FLATTEN flag indicates that all text and cdata under a specific
tag should be recursivly fetched and appended to the current Node's
value.
|
Constructor and Description |
---|
XPathRecordReader(String forEachXpath)
A constructor called with a '|' separated list of Xpath expressions
which define sub sections of the XML stream that are to be emitted as
separate records.
|
Modifier and Type | Method and Description |
---|---|
XPathRecordReader |
addField(String name,
String xpath,
boolean multiValued)
A wrapper around
addField0 to create a series of
Nodes based on the supplied Xpath and a given fieldName. |
XPathRecordReader |
addField(String name,
String xpath,
boolean multiValued,
int flags)
A wrapper around
addField0 to create a series of
Nodes based on the supplied Xpath and a given fieldName. |
List<Map<String,Object>> |
getAllRecords(Reader r)
Uses
streamRecords to parse the XML source but with
a handler that collects all the emitted records into a single List which
is returned upon completion. |
void |
streamRecords(Reader r,
XPathRecordReader.Handler handler)
Creates an XML stream reader on top of whatever reader has been
configured.
|
public static final int FLATTEN
public XPathRecordReader(String forEachXpath)
forEachXpath
- The XPATH for which a record is emitted. Once the
xpath tag is encountered, the Node.parse method starts collecting wanted
fields and at the close of the tag, a record is emitted containing all
fields collected since the tag start. Once
emitted the collected fields are cleared. Any fields collected in the
parent tag or above will also be included in the record, but these are
not cleared after emitting the record.
It uses the ' | ' syntax of XPATH to pass in multiple xpaths.public XPathRecordReader addField(String name, String xpath, boolean multiValued)
addField0
to create a series of
Nodes based on the supplied Xpath and a given fieldName. The created
nodes are inserted into a Node tree.name
- The name for this field in the emitted recordxpath
- The xpath expression for this fieldmultiValued
- If 'true' then the emitted record will have values in
a List<String>public XPathRecordReader addField(String name, String xpath, boolean multiValued, int flags)
addField0
to create a series of
Nodes based on the supplied Xpath and a given fieldName. The created
nodes are inserted into a Node tree.name
- The name for this field in the emitted recordxpath
- The xpath expression for this fieldmultiValued
- If 'true' then the emitted record will have values in
a List<String>flags
- FLATTEN: Recursively combine text from all child XML elementspublic List<Map<String,Object>> getAllRecords(Reader r)
streamRecords
to parse the XML source but with
a handler that collects all the emitted records into a single List which
is returned upon completion.r
- the stream readerpublic void streamRecords(Reader r, XPathRecordReader.Handler handler)
r
- the stream readerhandler
- The callback instanceCopyright © 2000-2017 Apache Software Foundation. All Rights Reserved.