org.apache.nutch.crawl
Class LinkDbFilter

java.lang.Object
  extended by org.apache.nutch.crawl.LinkDbFilter
All Implemented Interfaces:
Closeable, JobConfigurable, Mapper<Text,Inlinks,Text,Inlinks>

public class LinkDbFilter
extends Object
implements Mapper<Text,Inlinks,Text,Inlinks>

This class provides a way to separate the URL normalization and filtering steps from the rest of LinkDb manipulation code.

Author:
Andrzej Bialecki

Field Summary
static org.slf4j.Logger LOG
           
static String URL_FILTERING
           
static String URL_NORMALIZING
           
static String URL_NORMALIZING_SCOPE
           
 
Constructor Summary
LinkDbFilter()
           
 
Method Summary
 void close()
           
 void configure(JobConf job)
           
 void map(Text key, Inlinks value, OutputCollector<Text,Inlinks> output, Reporter reporter)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

URL_FILTERING

public static final String URL_FILTERING
See Also:
Constant Field Values

URL_NORMALIZING

public static final String URL_NORMALIZING
See Also:
Constant Field Values

URL_NORMALIZING_SCOPE

public static final String URL_NORMALIZING_SCOPE
See Also:
Constant Field Values

LOG

public static final org.slf4j.Logger LOG
Constructor Detail

LinkDbFilter

public LinkDbFilter()
Method Detail

configure

public void configure(JobConf job)
Specified by:
configure in interface JobConfigurable

close

public void close()
Specified by:
close in interface Closeable

map

public void map(Text key,
                Inlinks value,
                OutputCollector<Text,Inlinks> output,
                Reporter reporter)
         throws IOException
Specified by:
map in interface Mapper<Text,Inlinks,Text,Inlinks>
Throws:
IOException


Copyright © 2011 The Apache Software Foundation