public class FeedIndexingFilter extends Object implements IndexingFilter
IndexingFilter
implementation to pull out the relevant
extracted Metadata
fields from the RSS feeds and into the
index.Modifier and Type | Field and Description |
---|---|
static String |
dateFormatStr |
X_POINT_ID
Constructor and Description |
---|
FeedIndexingFilter() |
Modifier and Type | Method and Description |
---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields:
FEED_AUTHOR
FEED_TAGS
FEED_PUBLISHED
FEED_UPDATED
FEED
And sends them to the
Indexer for indexing within the Nutch index. |
Configuration |
getConf() |
void |
setConf(Configuration conf)
Sets the
Configuration object used to configure this
IndexingFilter . |
public static final String dateFormatStr
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
Indexer
for indexing within the Nutch index.filter
in interface IndexingFilter
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the page (fetch datum from segment containing
fetch status and fetch time)inlinks
- page inlinksIndexingException
public Configuration getConf()
getConf
in interface Configurable
Configuration
object used to configure this
IndexingFilter
.public void setConf(Configuration conf)
Configuration
object used to configure this
IndexingFilter
.setConf
in interface Configurable
conf
- The Configuration
object used to configure this
IndexingFilter
.Copyright © 2015 The Apache Software Foundation