public class BasicIndexingFilter extends Object implements IndexingFilter
indexer.add.domain
in nutch-default.xml. title is truncated as per
indexer.max.title.length
in nutch-default.xml. (As per NUTCH-1004, a
zero-length title is not added) content is truncated as per
indexer.max.content.length
in nutch-default.xml.Modifier and Type | Field and Description |
---|---|
static org.slf4j.Logger |
LOG |
X_POINT_ID
Constructor and Description |
---|
BasicIndexingFilter() |
Modifier and Type | Method and Description |
---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
The
BasicIndexingFilter filter object which supports few
configuration settings for adding basic searchable fields. |
Configuration |
getConf()
Get the
Configuration object |
void |
setConf(Configuration conf)
Set the
Configuration object |
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
BasicIndexingFilter
filter object which supports few
configuration settings for adding basic searchable fields. See
indexer.add.domain
, indexer.max.title.length
,
indexer.max.content.length
in nutch-default.xml.filter
in interface IndexingFilter
doc
- The NutchDocument
objectparse
- The relevant Parse
object passing through the filterurl
- URL to be filtered for anchor textdatum
- The CrawlDatum
entryinlinks
- The Inlinks
containing anchor textIndexingException
public void setConf(Configuration conf)
Configuration
objectsetConf
in interface Configurable
public Configuration getConf()
Configuration
objectgetConf
in interface Configurable
Copyright © 2015 The Apache Software Foundation