org.apache.nutch.scoring.webgraph
Class Loops

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.scoring.webgraph.Loops
All Implemented Interfaces:
Configurable, Tool

public class Loops
extends Configured
implements Tool

The Loops job identifies cycles of loops inside of the web graph. This is then used in the LinkRank program to remove those links from consideration during link analysis. This job will identify both reciprocal links and cycles of 2+ links up to a set depth to check. The Loops job is expensive in both computational and space terms. Because it checks outlinks of outlinks of outlinks for cycles its intermediate output can be extremly large even if the end output is rather small. Because of this the Loops job is optional and if it doesn't exist then it won't be factored into the LinkRank program.


Nested Class Summary
static class Loops.Finalizer
          Finishes the Loops job by aggregating and collecting and found routes.
static class Loops.Initializer
          Initializes the Loop routes.
static class Loops.Looper
          Follows a route path looking for the start url of the route.
static class Loops.LoopSet
          A set of loops.
static class Loops.Route
          A link path or route looking to identify a link cycle.
 
Field Summary
static org.slf4j.Logger LOG
           
static String LOOPS_DIR
           
static String ROUTES_DIR
           
 
Constructor Summary
Loops()
           
 
Method Summary
 void findLoops(Path webGraphDb)
          Runs the various loop jobs.
static void main(String[] args)
           
 int run(String[] args)
          Runs the Loops tool.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

LOG

public static final org.slf4j.Logger LOG

LOOPS_DIR

public static final String LOOPS_DIR
See Also:
Constant Field Values

ROUTES_DIR

public static final String ROUTES_DIR
See Also:
Constant Field Values
Constructor Detail

Loops

public Loops()
Method Detail

findLoops

public void findLoops(Path webGraphDb)
               throws IOException
Runs the various loop jobs.

Throws:
IOException

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Runs the Loops tool.

Specified by:
run in interface Tool
Throws:
Exception


Copyright © 2011 The Apache Software Foundation