org.apache.nutch.parse.feed
Class FeedParser
java.lang.Object
org.apache.nutch.parse.feed.FeedParser
- All Implemented Interfaces:
- Configurable, Parser, Pluggable
public class FeedParser
- extends Object
- implements Parser
- Since:
- NUTCH-444
A new RSS/ATOM FeedParser
that rapidly parses all referenced links
and content present in the feed.
- Author:
- dogacan, mattmann
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CHARSET_UTF8
public static final String CHARSET_UTF8
- See Also:
- Constant Field Values
TEXT_PLAIN_CONTENT_TYPE
public static final String TEXT_PLAIN_CONTENT_TYPE
- See Also:
- Constant Field Values
LOG
public static final org.slf4j.Logger LOG
FeedParser
public FeedParser()
getParse
public ParseResult getParse(Content content)
- Parses the given feed and extracts out and parsers all linked items within
the feed, using the underlying ROME feed parsing library.
- Specified by:
getParse
in interface Parser
- Parameters:
content
- A Content
object representing the feed that is being
parsed by this Parser
.
- Returns:
- A
ParseResult
containing all Parse
d feeds that
were present in the feed file that this Parser
dealt with.
setConf
public void setConf(Configuration conf)
- Sets the
Configuration
object for this Parser
. This
Parser
expects the following configuration properties to be set:
- URLNormalizers - properties in the configuration object to set up the
default url normalizers.
- URLFilters - properties in the configuration object to set up the
default url filters.
- Specified by:
setConf
in interface Configurable
- Parameters:
conf
- The Hadoop Configuration
object to use to configure this
Parser
.
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
- Returns:
- The
Configuration
object used to configure this
Parser
.
main
public static void main(String[] args)
throws Exception
- Runs a command line version of this
Parser
.
- Parameters:
args
- A single argument (expected at arg[0]) representing a path on the
local filesystem that points to a feed file.
- Throws:
Exception
- If any error occurs.
Copyright © 2012 The Apache Software Foundation