org.apache.nutch.scoring.webgraph
Class NodeDumper

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.scoring.webgraph.NodeDumper
All Implemented Interfaces:
Configurable, Tool

public class NodeDumper
extends Configured
implements Tool

A tools that dumps out the top urls by number of inlinks, number of outlinks, or by score, to a text file. One of the major uses of this tool is to check the top scoring urls of a link analysis program such as LinkRank. For number of inlinks or number of outlinks the WebGraph program will need to have been run. For link analysis score a program such as LinkRank will need to have been run which updates the NodeDb of the WebGraph.


Nested Class Summary
static class NodeDumper.Dumper
          Outputs the hosts or domains with an associated value.
static class NodeDumper.Sorter
          Outputs the top urls sorted in descending order.
 
Field Summary
static org.slf4j.Logger LOG
           
 
Constructor Summary
NodeDumper()
           
 
Method Summary
 void dumpNodes(Path webGraphDb, org.apache.nutch.scoring.webgraph.NodeDumper.DumpType type, long topN, Path output, boolean asEff, org.apache.nutch.scoring.webgraph.NodeDumper.NameType nameType, org.apache.nutch.scoring.webgraph.NodeDumper.AggrType aggrType, boolean asSequenceFile)
          Runs the process to dump the top urls out to a text file.
static void main(String[] args)
           
 int run(String[] args)
          Runs the node dumper tool.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

LOG

public static final org.slf4j.Logger LOG
Constructor Detail

NodeDumper

public NodeDumper()
Method Detail

dumpNodes

public void dumpNodes(Path webGraphDb,
                      org.apache.nutch.scoring.webgraph.NodeDumper.DumpType type,
                      long topN,
                      Path output,
                      boolean asEff,
                      org.apache.nutch.scoring.webgraph.NodeDumper.NameType nameType,
                      org.apache.nutch.scoring.webgraph.NodeDumper.AggrType aggrType,
                      boolean asSequenceFile)
               throws Exception
Runs the process to dump the top urls out to a text file.

Parameters:
webGraphDb - The WebGraph from which to pull values.
topN -
output -
Throws:
IOException - If an error occurs while dumping the top values.
Exception

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Runs the node dumper tool.

Specified by:
run in interface Tool
Throws:
Exception


Copyright © 2012 The Apache Software Foundation