public class FeedIndexingFilter extends Object implements IndexingFilter
IndexingFilter
implementation to pull out the
relevant extracted Metadata
fields from the RSS feeds
and into the index.Modifier and Type | Field and Description |
---|---|
static String |
dateFormatStr |
X_POINT_ID
Constructor and Description |
---|
FeedIndexingFilter() |
Modifier and Type | Method and Description |
---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields:
FEED_AUTHOR
FEED_TAGS
FEED_PUBLISHED
FEED_UPDATED
FEED
And sends them to the
Indexer for indexing within the Nutch
index. |
org.apache.hadoop.conf.Configuration |
getConf() |
void |
setConf(org.apache.hadoop.conf.Configuration conf)
Sets the
Configuration object used to configure this
IndexingFilter . |
public static final String dateFormatStr
public NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
Indexer
for indexing within the Nutch
index.filter
in interface IndexingFilter
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the pageinlinks
- page inlinksIndexingException
public org.apache.hadoop.conf.Configuration getConf()
getConf
in interface org.apache.hadoop.conf.Configurable
Configuration
object used to configure
this IndexingFilter
.public void setConf(org.apache.hadoop.conf.Configuration conf)
Configuration
object used to configure this
IndexingFilter
.setConf
in interface org.apache.hadoop.conf.Configurable
conf
- The Configuration
object used to configure
this IndexingFilter
.Copyright © 2014 The Apache Software Foundation