org.apache.nutch.crawl
Class URLPartitioner

java.lang.Object
  extended by org.apache.nutch.crawl.URLPartitioner
All Implemented Interfaces:
JobConfigurable, Partitioner<Text,Writable>

public class URLPartitioner
extends Object
implements Partitioner<Text,Writable>

Partition urls by host, domain name or IP depending on the value of the parameter 'partition.url.mode' which can be 'byHost', 'byDomain' or 'byIP'


Field Summary
static String PARTITION_MODE_DOMAIN
           
static String PARTITION_MODE_HOST
           
static String PARTITION_MODE_IP
           
static String PARTITION_MODE_KEY
           
 
Constructor Summary
URLPartitioner()
           
 
Method Summary
 void close()
           
 void configure(JobConf job)
           
 int getPartition(Text key, Writable value, int numReduceTasks)
          Hash by domain name.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PARTITION_MODE_KEY

public static final String PARTITION_MODE_KEY
See Also:
Constant Field Values

PARTITION_MODE_HOST

public static final String PARTITION_MODE_HOST
See Also:
Constant Field Values

PARTITION_MODE_DOMAIN

public static final String PARTITION_MODE_DOMAIN
See Also:
Constant Field Values

PARTITION_MODE_IP

public static final String PARTITION_MODE_IP
See Also:
Constant Field Values
Constructor Detail

URLPartitioner

public URLPartitioner()
Method Detail

configure

public void configure(JobConf job)
Specified by:
configure in interface JobConfigurable

close

public void close()

getPartition

public int getPartition(Text key,
                        Writable value,
                        int numReduceTasks)
Hash by domain name.

Specified by:
getPartition in interface Partitioner<Text,Writable>


Copyright © 2011 The Apache Software Foundation