org.apache.any23.extractor.microdata
Class MicrodataParser

java.lang.Object
  extended by org.apache.any23.extractor.microdata.MicrodataParser

public class MicrodataParser
extends Object

This class provides utility methods for handling Microdata nodes contained within a DOM document.

Author:
Michele Mostarda (mostarda@fbk.eu)

Field Summary
static Set<String> HREF_TAGS
          List of tags providing the href property.
static String ITEMPROP_ATTRIBUTE
           
static String ITEMSCOPE_ATTRIBUTE
           
static Set<String> SRC_TAGS
          List of tags providing the src property.
 
Constructor Summary
MicrodataParser(Document document)
           
 
Method Summary
 ItemProp[] deferProperties(String... refs)
          Given a document and a list of itemprop names this method will return such itemprops.
 org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode getErrorMode()
           
 MicrodataParserException[] getErrors()
           
static List<Node> getItemPropNodes(Node node)
          Returns all the itemProps detected within the given root node.
 List<ItemProp> getItemProps(Node node, boolean skipRoot)
          Returns all the itemprops for the given itemscope node.
 ItemScope getItemScope(Node node)
          Returns the ItemScope instance described within the specified node.
static List<Node> getItemScopeNodes(Node node)
          Returns all the itemScopes detected within the given root node.
static MicrodataParserReport getMicrodata(Document document)
          Returns all the Microdata items detected within the given document, works in full report mode.
static MicrodataParserReport getMicrodata(Document document, org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode errorMode)
          Returns all the Microdata items detected within the given document.
static void getMicrodataAsJSON(Document document, PrintStream ps)
          Returns a JSON containing the list of all extracted Microdata, as described at Microdata JSON Specification.
 ItemPropValue getPropertyValue(Node node)
          Reads the value of a itemprop node.
static List<Node> getTopLevelItemScopeNodes(Node node)
          Returns only the itemScopes that are top level items.
static boolean isItemProp(Node node)
          Check whether a node is an itemProp.
static boolean isItemScope(Node node)
          Check whether a node is an itemScope.
 void setErrorMode(org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode errorMode)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ITEMSCOPE_ATTRIBUTE

public static final String ITEMSCOPE_ATTRIBUTE
See Also:
Constant Field Values

ITEMPROP_ATTRIBUTE

public static final String ITEMPROP_ATTRIBUTE
See Also:
Constant Field Values

SRC_TAGS

public static final Set<String> SRC_TAGS
List of tags providing the src property.


HREF_TAGS

public static final Set<String> HREF_TAGS
List of tags providing the href property.

Constructor Detail

MicrodataParser

public MicrodataParser(Document document)
Method Detail

getItemScopeNodes

public static List<Node> getItemScopeNodes(Node node)
Returns all the itemScopes detected within the given root node.

Parameters:
node - root node to search in.
Returns:
list of detected items.

isItemScope

public static boolean isItemScope(Node node)
Check whether a node is an itemScope.

Parameters:
node - node to check.
Returns:
true if the node is an itemScope., false otherwise.

getItemPropNodes

public static List<Node> getItemPropNodes(Node node)
Returns all the itemProps detected within the given root node.

Parameters:
node - root node to search in.
Returns:
list of detected items.

isItemProp

public static boolean isItemProp(Node node)
Check whether a node is an itemProp.

Parameters:
node - node to check.
Returns:
true if the node is an itemProp., false otherwise.

getTopLevelItemScopeNodes

public static List<Node> getTopLevelItemScopeNodes(Node node)
Returns only the itemScopes that are top level items.

Parameters:
node - root node to search in.
Returns:
list of detected top item scopes.

getMicrodata

public static MicrodataParserReport getMicrodata(Document document,
                                                 org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode errorMode)
                                          throws MicrodataParserException
Returns all the Microdata items detected within the given document.

Parameters:
document - document to be processed.
errorMode - error management policy.
Returns:
list of itemscope items.
Throws:
MicrodataParserException - if errorMode == ErrorMode#StopAtFirstError and an error occurs.

getMicrodata

public static MicrodataParserReport getMicrodata(Document document)
Returns all the Microdata items detected within the given document, works in full report mode.

Parameters:
document - document to be processed.
Returns:
list of itemscope items.

getMicrodataAsJSON

public static void getMicrodataAsJSON(Document document,
                                      PrintStream ps)
Returns a JSON containing the list of all extracted Microdata, as described at Microdata JSON Specification.

Parameters:
document - document to be processed.
ps -

setErrorMode

public void setErrorMode(org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode errorMode)

getErrorMode

public org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode getErrorMode()

getErrors

public MicrodataParserException[] getErrors()

getPropertyValue

public ItemPropValue getPropertyValue(Node node)
                               throws MicrodataParserException
Reads the value of a itemprop node.

Parameters:
node - itemprop node.
Returns:
value detected within the given node.
Throws:
MicrodataParserException - if an error occurs while extracting a nested item scope.

getItemProps

public List<ItemProp> getItemProps(Node node,
                                   boolean skipRoot)
                            throws MicrodataParserException
Returns all the itemprops for the given itemscope node.

Parameters:
node - node representing the itemscope
skipRoot - if true the given root node will be not read as a property, even if it contains the itemprop attribute.
Returns:
the list of itemprops detected within the given itemscope.
Throws:
MicrodataParserException - if an error occurs while retrieving an property value.

deferProperties

public ItemProp[] deferProperties(String... refs)
                           throws MicrodataParserException
Given a document and a list of itemprop names this method will return such itemprops.

Parameters:
refs - list of references.
Returns:
list of retrieved itemprops.
Throws:
MicrodataParserException - if a loop is detected or a property name is missing.

getItemScope

public ItemScope getItemScope(Node node)
                       throws MicrodataParserException
Returns the ItemScope instance described within the specified node.

Parameters:
node - node describing an itemscope.
Returns:
instance of ItemScope object.
Throws:
MicrodataParserException - if an error occurs while dereferencing properties.


Copyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.