public class DOMContentUtils extends Object
Constructor and Description |
---|
DOMContentUtils(Configuration conf) |
Modifier and Type | Method and Description |
---|---|
void |
getOutlinks(URL base,
ArrayList<Outlink> outlinks,
Node node)
|
void |
getText(StringBuffer sb,
Node node)
This is a convinience method, equivalent to
getText(sb, node, false) . |
boolean |
getTitle(StringBuffer sb,
Node node)
This method takes a
StringBuffer and a DOM Node , and will
append the content text found beneath the first title node to
the StringBuffer . |
void |
setConf(Configuration conf) |
public DOMContentUtils(Configuration conf)
public void setConf(Configuration conf)
public void getText(StringBuffer sb, Node node)
getText(sb, node, false)
.public boolean getTitle(StringBuffer sb, Node node)
StringBuffer
and a DOM Node
, and will
append the content text found beneath the first title
node to
the StringBuffer
.public void getOutlinks(URL base, ArrayList<Outlink> outlinks, Node node)
node
, and
creates appropriate Outlink
records for each (relative to the
supplied base
URL), and adds them to the outlinks
ArrayList
.
Links without inner structure (tags, text, etc) are discarded, as are links which contain only single nested links and empty text nodes (this is a common DOM-fixup artifact, at least with nekohtml).
Copyright © 2015 The Apache Software Foundation