org.apache.nutch.parse
Class ParseUtil

java.lang.Object
  extended by org.apache.nutch.parse.ParseUtil

public class ParseUtil
extends Object

A Utility class containing methods to simply perform parsing utilities such as iterating through a preferred list of Parsers to obtain Parse objects.

Author:
mattmann, Jérôme Charron, Sébastien Le Callonnec

Field Summary
static org.slf4j.Logger LOG
           
 
Constructor Summary
ParseUtil(Configuration conf)
           
 
Method Summary
 ParseResult parse(Content content)
          Performs a parse by iterating through a List of preferred Parsers until a successful parse is performed and a Parse object is returned.
 ParseResult parseByExtensionId(String extId, Content content)
          Method parses a Content object using the Parser specified by the parameter extId, i.e., the Parser's extension ID.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.slf4j.Logger LOG
Constructor Detail

ParseUtil

public ParseUtil(Configuration conf)
Parameters:
conf -
Method Detail

parse

public ParseResult parse(Content content)
                  throws ParseException
Performs a parse by iterating through a List of preferred Parsers until a successful parse is performed and a Parse object is returned. If the parse is unsuccessful, a message is logged to the WARNING level, and an empty parse is returned.

Parameters:
content - The content to try and parse.
Returns:
<key, Parse> pairs.
Throws:
ParseException - If no suitable parser is found to perform the parse.

parseByExtensionId

public ParseResult parseByExtensionId(String extId,
                                      Content content)
                               throws ParseException
Method parses a Content object using the Parser specified by the parameter extId, i.e., the Parser's extension ID. If a suitable Parser is not found, then a WARNING level message is logged, and a ParseException is thrown. If the parse is uncessful for any other reason, then a WARNING level message is logged, and a ParseStatus.getEmptyParse() is returned.

Parameters:
extId - The extension implementation ID of the Parser to use to parse the specified content.
content - The content to parse.
Returns:
<key, Parse> pairs if the parse is successful, otherwise, a single <key, ParseStatus.getEmptyParse()> pair.
Throws:
ParseException - If there is no suitable Parser found to perform the parse.


Copyright © 2012 The Apache Software Foundation