org.apache.nutch.parse
Class ParseUtil
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.nutch.parse.ParseUtil
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable
public class ParseUtil
- extends org.apache.hadoop.conf.Configured
A Utility class containing methods to simply perform parsing utilities such
as iterating through a preferred list of Parser
s to obtain
Parse
objects.
- Author:
- mattmann, Jérôme Charron, Sébastien Le Callonnec
Field Summary |
static org.slf4j.Logger |
LOG
|
Constructor Summary |
ParseUtil(org.apache.hadoop.conf.Configuration conf)
|
Method Summary |
org.apache.hadoop.conf.Configuration |
getConf()
|
Parse |
parse(String url,
WebPage page)
Performs a parse by iterating through a List of preferred Parser s
until a successful parse is performed and a Parse object is
returned. |
void |
process(String key,
WebPage page)
Parses given web page and stores parsed content within page. |
void |
setConf(org.apache.hadoop.conf.Configuration conf)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.slf4j.Logger LOG
ParseUtil
public ParseUtil(org.apache.hadoop.conf.Configuration conf)
- Parameters:
conf
-
getConf
public org.apache.hadoop.conf.Configuration getConf()
- Specified by:
getConf
in interface org.apache.hadoop.conf.Configurable
- Overrides:
getConf
in class org.apache.hadoop.conf.Configured
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf)
- Specified by:
setConf
in interface org.apache.hadoop.conf.Configurable
- Overrides:
setConf
in class org.apache.hadoop.conf.Configured
parse
public Parse parse(String url,
WebPage page)
throws ParserNotFound,
ParseException
- Performs a parse by iterating through a List of preferred
Parser
s
until a successful parse is performed and a Parse
object is
returned. If the parse is unsuccessful, a message is logged to the
WARNING
level, and an empty parse is returned.
- Throws:
ParserNotFound
- If there is no suitable parser found.
ParseException
- If there is an error parsing.
process
public void process(String key,
WebPage page)
- Parses given web page and stores parsed content within page. Puts
a meta-redirect to outlinks.
- Parameters:
key
- page
-
Copyright © 2013 The Apache Software Foundation