public class URLMetaIndexingFilter extends Object implements IndexingFilter
X_POINT_ID
Constructor and Description |
---|
URLMetaIndexingFilter() |
Modifier and Type | Method and Description |
---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
This will take the metatags that you have listed in your "urlmeta.tags"
property, and looks for them inside the CrawlDatum object.
|
org.apache.hadoop.conf.Configuration |
getConf()
Boilerplate
|
void |
setConf(org.apache.hadoop.conf.Configuration conf)
handles conf assignment and pulls the value assignment from the
"urlmeta.tags" property
|
public NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
filter
in interface IndexingFilter
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the pageinlinks
- page inlinksIndexingException
IndexingFilter.filter(org.apache.nutch.indexer.NutchDocument, org.apache.nutch.parse.Parse, org.apache.hadoop.io.Text, org.apache.nutch.crawl.CrawlDatum, org.apache.nutch.crawl.Inlinks)
public org.apache.hadoop.conf.Configuration getConf()
getConf
in interface org.apache.hadoop.conf.Configurable
public void setConf(org.apache.hadoop.conf.Configuration conf)
setConf
in interface org.apache.hadoop.conf.Configurable
Copyright © 2014 The Apache Software Foundation