org.apache.nutch.scoring.webgraph
Class Loops
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.nutch.scoring.webgraph.Loops
- All Implemented Interfaces:
- Configurable, Tool
public class Loops
- extends Configured
- implements Tool
The Loops job identifies cycles of loops inside of the web graph. This is
then used in the LinkRank program to remove those links from consideration
during link analysis.
This job will identify both reciprocal links and cycles of 2+ links up to a
set depth to check. The Loops job is expensive in both computational and
space terms. Because it checks outlinks of outlinks of outlinks for cycles
its intermediate output can be extremly large even if the end output is
rather small. Because of this the Loops job is optional and if it doesn't
exist then it won't be factored into the LinkRank program.
Nested Class Summary |
static class |
Loops.Finalizer
Finishes the Loops job by aggregating and collecting and found routes. |
static class |
Loops.Initializer
Initializes the Loop routes. |
static class |
Loops.Looper
Follows a route path looking for the start url of the route. |
static class |
Loops.LoopSet
A set of loops. |
static class |
Loops.Route
A link path or route looking to identify a link cycle. |
Constructor Summary |
Loops()
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.slf4j.Logger LOG
LOOPS_DIR
public static final String LOOPS_DIR
- See Also:
- Constant Field Values
ROUTES_DIR
public static final String ROUTES_DIR
- See Also:
- Constant Field Values
Loops
public Loops()
findLoops
public void findLoops(Path webGraphDb)
throws IOException
- Runs the various loop jobs.
- Throws:
IOException
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Runs the Loops tool.
- Specified by:
run
in interface Tool
- Throws:
Exception
Copyright © 2011 The Apache Software Foundation