org.apache.nutch.urlfilter.regex
Class RegexURLFilter
java.lang.Object
org.apache.nutch.urlfilter.api.RegexURLFilterBase
org.apache.nutch.urlfilter.regex.RegexURLFilter
- All Implemented Interfaces:
- Configurable, URLFilter, Pluggable
public class RegexURLFilter
- extends RegexURLFilterBase
Filters URLs based on a file of regular expressions using the
Java Regex implementation
.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RegexURLFilter
public RegexURLFilter()
RegexURLFilter
public RegexURLFilter(String filename)
throws IOException,
PatternSyntaxException
- Throws:
IOException
PatternSyntaxException
getRulesFile
protected String getRulesFile(Configuration conf)
- Description copied from class:
RegexURLFilterBase
- Returns the name of the file of rules to use for
a particular implementation.
- Specified by:
getRulesFile
in class RegexURLFilterBase
- Parameters:
conf
- is the current configuration.
- Returns:
- the name of the file of rules to use.
createRule
protected RegexRule createRule(boolean sign,
String regex)
- Description copied from class:
RegexURLFilterBase
- Creates a new
RegexRule
.
- Specified by:
createRule
in class RegexURLFilterBase
- Parameters:
sign
- of the regular expression.
A true
value means that any URL matching this rule
must be included, whereas a false
value means that any URL matching this rule must be excluded.regex
- is the regular expression associated to this rule.
main
public static void main(String[] args)
throws IOException
- Throws:
IOException
Copyright © 2006 The Apache Software Foundation