public class LinkDbMerger extends org.apache.hadoop.conf.Configured implements org.apache.hadoop.util.Tool, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.Text,Inlinks,org.apache.hadoop.io.Text,Inlinks>
It's possible to use this tool just for filtering - in that case only one LinkDb should be specified in arguments.
If more than one LinkDb contains information about the same URL,
all inlinks are accumulated, but only at most db.max.inlinks
inlinks will ever be added.
If activated, URLFilters will be applied to both the target URLs and to any incoming link URL. If a target URL is prohibited, all inlinks to that target will be removed, including the target URL. If some of incoming links are prohibited, only they will be removed, and they won't count when checking the above-mentioned maximum limit.
Constructor and Description |
---|
LinkDbMerger() |
LinkDbMerger(org.apache.hadoop.conf.Configuration conf) |
Modifier and Type | Method and Description |
---|---|
void |
close() |
void |
configure(org.apache.hadoop.mapred.JobConf job) |
static org.apache.hadoop.mapred.JobConf |
createMergeJob(org.apache.hadoop.conf.Configuration config,
org.apache.hadoop.fs.Path linkDb,
boolean normalize,
boolean filter) |
static void |
main(String[] args) |
void |
merge(org.apache.hadoop.fs.Path output,
org.apache.hadoop.fs.Path[] dbs,
boolean normalize,
boolean filter) |
void |
reduce(org.apache.hadoop.io.Text key,
Iterator<Inlinks> values,
org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,Inlinks> output,
org.apache.hadoop.mapred.Reporter reporter) |
int |
run(String[] args) |
public LinkDbMerger()
public LinkDbMerger(org.apache.hadoop.conf.Configuration conf)
public void reduce(org.apache.hadoop.io.Text key, Iterator<Inlinks> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,Inlinks> output, org.apache.hadoop.mapred.Reporter reporter) throws IOException
reduce
in interface org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.Text,Inlinks,org.apache.hadoop.io.Text,Inlinks>
IOException
public void configure(org.apache.hadoop.mapred.JobConf job)
configure
in interface org.apache.hadoop.mapred.JobConfigurable
public void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
IOException
public void merge(org.apache.hadoop.fs.Path output, org.apache.hadoop.fs.Path[] dbs, boolean normalize, boolean filter) throws Exception
Exception
public static org.apache.hadoop.mapred.JobConf createMergeJob(org.apache.hadoop.conf.Configuration config, org.apache.hadoop.fs.Path linkDb, boolean normalize, boolean filter)
Copyright © 2014 The Apache Software Foundation