|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.conf.Configured org.apache.nutch.tools.compat.ReprUrlFixer
public class ReprUrlFixer
Significant changes were made to representative url logic used for redirects. This tool will fix representative urls stored in current segments and crawl databases. Any new fetches will use the new representative url logic.
All crawl datums are assumed to be temp url redirects. While this may cause some urls to be incorrectly removed, this tool is a temporary measure to be used until fetches can be rerun. This reduce logic is the same for segments fetch and parse directory as well as for existing crawl databases.
Field Summary | |
---|---|
static org.apache.commons.logging.Log |
LOG
|
Constructor Summary | |
---|---|
ReprUrlFixer()
|
Method Summary | |
---|---|
void |
close()
|
void |
configure(JobConf conf)
|
static void |
main(String[] args)
Runs The ReprUrlFixer. |
void |
reduce(Text key,
Iterator<CrawlDatum> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
Runs the new ReprUrl logic on all crawldatums. |
int |
run(String[] args)
Parse command line options and execute the main update logic. |
void |
update(Path crawlDb,
Path[] segments)
Run the fixer on any crawl database and segments specified. |
Methods inherited from class org.apache.hadoop.conf.Configured |
---|
getConf, setConf |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.hadoop.conf.Configurable |
---|
getConf, setConf |
Field Detail |
---|
public static final org.apache.commons.logging.Log LOG
Constructor Detail |
---|
public ReprUrlFixer()
Method Detail |
---|
public void configure(JobConf conf)
configure
in interface JobConfigurable
public void reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter) throws IOException
reduce
in interface Reducer<Text,CrawlDatum,Text,CrawlDatum>
IOException
public void close()
close
in interface Closeable
public void update(Path crawlDb, Path[] segments) throws IOException
IOException
public static void main(String[] args) throws Exception
Exception
public int run(String[] args) throws Exception
run
in interface Tool
Exception
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |