org.apache.nutch.urlfilter.automaton
Class AutomatonURLFilter
java.lang.Object
org.apache.nutch.urlfilter.api.RegexURLFilterBase
org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
- All Implemented Interfaces:
- Configurable, URLFilter, Pluggable
public class AutomatonURLFilter
- extends RegexURLFilterBase
RegexURLFilterBase implementation based on the
dk.brics.automaton
Finite-State Automata for JavaTM.
- Author:
- Jérôme Charron
- See Also:
- dk.brics.automaton
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
AutomatonURLFilter
public AutomatonURLFilter()
AutomatonURLFilter
public AutomatonURLFilter(String filename)
throws IOException,
PatternSyntaxException
- Throws:
IOException
PatternSyntaxException
getRulesFile
protected String getRulesFile(Configuration conf)
- Description copied from class:
RegexURLFilterBase
- Returns the name of the file of rules to use for
a particular implementation.
- Specified by:
getRulesFile
in class RegexURLFilterBase
- Parameters:
conf
- is the current configuration.
- Returns:
- the name of the file of rules to use.
createRule
protected RegexRule createRule(boolean sign,
String regex)
- Description copied from class:
RegexURLFilterBase
- Creates a new
RegexRule
.
- Specified by:
createRule
in class RegexURLFilterBase
- Parameters:
sign
- of the regular expression.
A true
value means that any URL matching this rule
must be included, whereas a false
value means that any URL matching this rule must be excluded.regex
- is the regular expression associated to this rule.
main
public static void main(String[] args)
throws IOException
- Throws:
IOException
Copyright © 2006 The Apache Software Foundation