org.apache.nutch.crawl
Class CrawlDbFilter

java.lang.Object
  extended by org.apache.nutch.crawl.CrawlDbFilter
All Implemented Interfaces:
Closeable, JobConfigurable, Mapper<Text,CrawlDatum,Text,CrawlDatum>

public class CrawlDbFilter
extends Object
implements Mapper<Text,CrawlDatum,Text,CrawlDatum>

This class provides a way to separate the URL normalization and filtering steps from the rest of CrawlDb manipulation code.

Author:
Andrzej Bialecki

Field Summary
static org.apache.commons.logging.Log LOG
           
static String URL_FILTERING
           
static String URL_NORMALIZING
           
static String URL_NORMALIZING_SCOPE
           
 
Constructor Summary
CrawlDbFilter()
           
 
Method Summary
 void close()
           
 void configure(JobConf job)
           
 void map(Text key, CrawlDatum value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

URL_FILTERING

public static final String URL_FILTERING
See Also:
Constant Field Values

URL_NORMALIZING

public static final String URL_NORMALIZING
See Also:
Constant Field Values

URL_NORMALIZING_SCOPE

public static final String URL_NORMALIZING_SCOPE
See Also:
Constant Field Values

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

CrawlDbFilter

public CrawlDbFilter()
Method Detail

configure

public void configure(JobConf job)
Specified by:
configure in interface JobConfigurable

close

public void close()
Specified by:
close in interface Closeable

map

public void map(Text key,
                CrawlDatum value,
                OutputCollector<Text,CrawlDatum> output,
                Reporter reporter)
         throws IOException
Specified by:
map in interface Mapper<Text,CrawlDatum,Text,CrawlDatum>
Throws:
IOException


Copyright © 2011 The Apache Software Foundation