org.apache.nutch.urlfilter.automaton
Class AutomatonURLFilter

java.lang.Object
  extended by org.apache.nutch.urlfilter.api.RegexURLFilterBase
      extended by org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
All Implemented Interfaces:
Configurable, URLFilter, Pluggable

public class AutomatonURLFilter
extends RegexURLFilterBase

RegexURLFilterBase implementation based on the dk.brics.automaton Finite-State Automata for JavaTM.

Author:
Jérôme Charron
See Also:
dk.brics.automaton

Field Summary
 
Fields inherited from interface org.apache.nutch.net.URLFilter
X_POINT_ID
 
Constructor Summary
AutomatonURLFilter()
           
AutomatonURLFilter(String filename)
           
 
Method Summary
protected  RegexRule createRule(boolean sign, String regex)
          Creates a new RegexRule.
protected  String getRulesFile(Configuration conf)
          Returns the name of the file of rules to use for a particular implementation.
static void main(String[] args)
           
 
Methods inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase
filter, getConf, main, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AutomatonURLFilter

public AutomatonURLFilter()

AutomatonURLFilter

public AutomatonURLFilter(String filename)
                   throws IOException,
                          PatternSyntaxException
Throws:
IOException
PatternSyntaxException
Method Detail

getRulesFile

protected String getRulesFile(Configuration conf)
Description copied from class: RegexURLFilterBase
Returns the name of the file of rules to use for a particular implementation.

Specified by:
getRulesFile in class RegexURLFilterBase
Parameters:
conf - is the current configuration.
Returns:
the name of the file of rules to use.

createRule

protected RegexRule createRule(boolean sign,
                               String regex)
Description copied from class: RegexURLFilterBase
Creates a new RegexRule.

Specified by:
createRule in class RegexURLFilterBase
Parameters:
sign - of the regular expression. A true value means that any URL matching this rule must be included, whereas a false value means that any URL matching this rule must be excluded.
regex - is the regular expression associated to this rule.

main

public static void main(String[] args)
                 throws IOException
Throws:
IOException


Copyright © 2006 The Apache Software Foundation