org.apache.nutch.parse.feed
Class FeedParser

java.lang.Object
  extended by org.apache.nutch.parse.feed.FeedParser
All Implemented Interfaces:
Configurable, Parser, Pluggable

public class FeedParser
extends Object
implements Parser

Since:
NUTCH-444

A new RSS/ATOM FeedParser that rapidly parses all referenced links and content present in the feed.

Author:
dogacan, mattmann

Field Summary
static String CHARSET_UTF8
           
static org.slf4j.Logger LOG
           
static String TEXT_PLAIN_CONTENT_TYPE
           
 
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
 
Constructor Summary
FeedParser()
           
 
Method Summary
 Configuration getConf()
           
 ParseResult getParse(Content content)
          Parses the given feed and extracts out and parsers all linked items within the feed, using the underlying ROME feed parsing library.
static void main(String[] args)
          Runs a command line version of this Parser.
 void setConf(Configuration conf)
          Sets the Configuration object for this Parser.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CHARSET_UTF8

public static final String CHARSET_UTF8
See Also:
Constant Field Values

TEXT_PLAIN_CONTENT_TYPE

public static final String TEXT_PLAIN_CONTENT_TYPE
See Also:
Constant Field Values

LOG

public static final org.slf4j.Logger LOG
Constructor Detail

FeedParser

public FeedParser()
Method Detail

getParse

public ParseResult getParse(Content content)
Parses the given feed and extracts out and parsers all linked items within the feed, using the underlying ROME feed parsing library.

Specified by:
getParse in interface Parser
Parameters:
content - A Content object representing the feed that is being parsed by this Parser.
Returns:
A ParseResult containing all Parsed feeds that were present in the feed file that this Parser dealt with.

setConf

public void setConf(Configuration conf)
Sets the Configuration object for this Parser. This Parser expects the following configuration properties to be set:

Specified by:
setConf in interface Configurable
Parameters:
conf - The Hadoop Configuration object to use to configure this Parser.

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable
Returns:
The Configuration object used to configure this Parser.

main

public static void main(String[] args)
                 throws Exception
Runs a command line version of this Parser.

Parameters:
args - A single argument (expected at arg[0]) representing a path on the local filesystem that points to a feed file.
Throws:
Exception - If any error occurs.


Copyright © 2012 The Apache Software Foundation