|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.any23.extractor.html.DomUtils
public class DomUtils
This class provides utility methods for DOM manipulation.
It is separated from HTMLDocument
so that its methods
can be run on single DOM nodes without having to wrap them
into an HTMLDocument.
We use a mix of XPath and DOM manipulation.
Method Summary | |
---|---|
static String |
find(Node node,
String xpath)
Gets the string value of an XPath expression. |
static List<Node> |
findAll(Node node,
String xpath)
Returns a NodeList composed of all the nodes that match an XPath expression, which must be valid. |
static List<Node> |
findAllByAttributeName(Node root,
String attrName)
Finds all nodes that have a declared attribute. |
static List<Node> |
findAllByClassName(Node root,
String className)
Finds all nodes that have a declared class. |
static List<Node> |
findAllByTag(Node root,
String tagName)
|
static List<Node> |
findAllByTagAndClassName(Node root,
String tagName,
String className)
|
static Node |
findNodeById(Node root,
String id)
Mimics the JS DOM API, or prototype's $() |
static int |
getIndexInParent(Node n)
Given a node this method returns the index corresponding to such node within the list of the children of its parent node. |
static int[] |
getNodeLocation(Node n)
Returns the row/col location of the given node. |
static String |
getXPathForNode(Node node)
Does a reverse walking of the DOM tree to generate a unique XPath expression leading to this node. |
static String[] |
getXPathListForNode(Node n)
Returns a list of tag names representing the path from the document root to the given node n. |
static boolean |
hasAttribute(Node node,
String attributeName)
Checks the presence of an attribute in the given node . |
static boolean |
hasAttribute(Node node,
String attributeName,
String className)
Checks the presence of an attribute value in attributes that contain whitespace-separated lists of values. |
static boolean |
hasClassName(Node node,
String className)
Tells if an element has a class name not checking the parents in the hierarchy mimicking the CSS .foo match. |
static boolean |
isAncestorOf(Node candidateAncestor,
Node candidateSibling)
Checks whether a node is ancestor or same of another node. |
static boolean |
isAncestorOf(Node candidateAncestor,
Node candidateSibling,
boolean strict)
Checks whether a node is ancestor or same of another node. |
static boolean |
isElementNode(Node target)
Verifies if the given target node is an element. |
static String |
readAttribute(Node node,
String attribute)
Reads the value of an attribute , returning the
empty string if not present. |
static String |
readAttribute(Node node,
String attribute,
String defaultValue)
Reads the value of the specified attribute , returning the
defaultValue string if not present. |
static String |
readAttributeWithPrefix(Node node,
String attributePrefix,
String defaultValue)
Reads the value of the first attribute which name matches with the specified attributePrefix . |
static String |
serializeToXML(Node node,
boolean indent)
Given a DOM Node produces the XML serialization
omitting the XML declaration. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static int getIndexInParent(Node n)
n
- the node of which returning the index.
public static String getXPathForNode(Node node)
node
- the input node.
public static String[] getXPathListForNode(Node n)
n
- the node for which retrieve the path.
public static int[] getNodeLocation(Node n)
n
- input node.
[<begin-row>, <begin-col>, <end-row> <end-col>]
or null
if not possible to extract such data.public static boolean isAncestorOf(Node candidateAncestor, Node candidateSibling, boolean strict)
candidateAncestor
- the candidate ancestor node.candidateSibling
- the candidate sibling node.strict
- if true
is not allowed that the ancestor and sibling can be the same node.
true
if candidateSibling
is ancestor of candidateSibling
,
false
otherwise.public static boolean isAncestorOf(Node candidateAncestor, Node candidateSibling)
isAncestorOf(org.w3c.dom.Node, org.w3c.dom.Node, boolean)
with strict=false
.
candidateAncestor
- the candidate ancestor node.candidateSibling
- the candidate sibling node.
true
if candidateSibling
is ancestor of candidateSibling
,
false
otherwise.public static List<Node> findAllByClassName(Node root, String className)
root
- the root node from which start searching.className
- the name of the filtered class.
public static List<Node> findAllByAttributeName(Node root, String attrName)
root
- the root node from which start searching.attrName
- the name of the filtered attribue.
public static List<Node> findAllByTag(Node root, String tagName)
public static List<Node> findAllByTagAndClassName(Node root, String tagName, String className)
public static Node findNodeById(Node root, String id)
public static List<Node> findAll(Node node, String xpath)
public static String find(Node node, String xpath)
public static boolean hasClassName(Node node, String className)
public static boolean hasAttribute(Node node, String attributeName, String className)
public static boolean hasAttribute(Node node, String attributeName)
node
.
node
- the node container.attributeName
- the name of the attribute.public static boolean isElementNode(Node target)
target
-
true
if the element the node is an element,
false
otherwise.public static String readAttribute(Node node, String attribute, String defaultValue)
attribute
, returning the
defaultValue
string if not present.
node
- node to read the attribute.attribute
- attribute name.defaultValue
- the default value to return if attribute is not found.
defaultValue
if not found.public static String readAttributeWithPrefix(Node node, String attributePrefix, String defaultValue)
attributePrefix
.
Returns the defaultValue
if not found.
node
- node to look for attributes.attributePrefix
- attribute prefix.defaultValue
- default returned value.
public static String readAttribute(Node node, String attribute)
attribute
, returning the
empty string if not present.
node
- node to read the attribute.attribute
- attribute name.
""
if not found.public static String serializeToXML(Node node, boolean indent) throws TransformerException, IOException
Node
produces the XML serialization
omitting the XML declaration.
node
- node to be serialized.indent
- if true
the output is indented.
TransformerException
- if an error occurs during the
serializator initialization and activation.
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |