org.apache.nutch.indexer.metadata
Class MetadataIndexer
java.lang.Object
org.apache.nutch.indexer.metadata.MetadataIndexer
- All Implemented Interfaces:
- Configurable, IndexingFilter, Pluggable
public class MetadataIndexer
- extends Object
- implements IndexingFilter
Indexer which can be configured to extract metadata from the crawldb, parse metadata or content metadata.
You can specify the properties "index.db", "index.parse" or "index.content" who's values are
comma-delimited key1, key2, key3.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
MetadataIndexer
public MetadataIndexer()
filter
public NutchDocument filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
throws IndexingException
- Description copied from interface:
IndexingFilter
- Adds fields or otherwise modifies the document that will be indexed for a
parse. Unwanted documents can be removed from indexing by returning a null value.
- Specified by:
filter
in interface IndexingFilter
- Parameters:
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the pageinlinks
- page inlinks
- Returns:
- modified (or a new) document instance, or null (meaning the document
should be discarded)
- Throws:
IndexingException
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interface Configurable
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
Copyright © 2012 The Apache Software Foundation