public class SubcollectionIndexingFilter extends org.apache.hadoop.conf.Configured implements IndexingFilter
Modifier and Type | Field and Description |
---|---|
static String |
fieldName
Doc field name
|
static org.slf4j.Logger |
LOG
Logger
|
X_POINT_ID
Constructor and Description |
---|
SubcollectionIndexingFilter() |
SubcollectionIndexingFilter(org.apache.hadoop.conf.Configuration conf) |
Modifier and Type | Method and Description |
---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a
parse.
|
org.apache.hadoop.conf.Configuration |
getConf() |
void |
setConf(org.apache.hadoop.conf.Configuration conf) |
public static String fieldName
public static final org.slf4j.Logger LOG
public SubcollectionIndexingFilter()
public SubcollectionIndexingFilter(org.apache.hadoop.conf.Configuration conf)
public void setConf(org.apache.hadoop.conf.Configuration conf)
setConf
in interface org.apache.hadoop.conf.Configurable
setConf
in class org.apache.hadoop.conf.Configured
Configuration
- confpublic org.apache.hadoop.conf.Configuration getConf()
getConf
in interface org.apache.hadoop.conf.Configurable
getConf
in class org.apache.hadoop.conf.Configured
public NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
IndexingFilter
filter
in interface IndexingFilter
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the pageinlinks
- page inlinksIndexingException
Copyright © 2014 The Apache Software Foundation