org.apache.nutch.urlfilter.regex
Class RegexURLFilter

java.lang.Object
  extended by org.apache.nutch.urlfilter.api.RegexURLFilterBase
      extended by org.apache.nutch.urlfilter.regex.RegexURLFilter
All Implemented Interfaces:
Configurable, URLFilter, Pluggable

public class RegexURLFilter
extends RegexURLFilterBase

Filters URLs based on a file of regular expressions using the Java Regex implementation.


Field Summary
 
Fields inherited from interface org.apache.nutch.net.URLFilter
X_POINT_ID
 
Constructor Summary
RegexURLFilter()
           
RegexURLFilter(String filename)
           
 
Method Summary
protected  RegexRule createRule(boolean sign, String regex)
          Creates a new RegexRule.
protected  String getRulesFile(Configuration conf)
          Returns the name of the file of rules to use for a particular implementation.
static void main(String[] args)
           
 
Methods inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase
filter, getConf, main, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RegexURLFilter

public RegexURLFilter()

RegexURLFilter

public RegexURLFilter(String filename)
               throws IOException,
                      PatternSyntaxException
Throws:
IOException
PatternSyntaxException
Method Detail

getRulesFile

protected String getRulesFile(Configuration conf)
Description copied from class: RegexURLFilterBase
Returns the name of the file of rules to use for a particular implementation.

Specified by:
getRulesFile in class RegexURLFilterBase
Parameters:
conf - is the current configuration.
Returns:
the name of the file of rules to use.

createRule

protected RegexRule createRule(boolean sign,
                               String regex)
Description copied from class: RegexURLFilterBase
Creates a new RegexRule.

Specified by:
createRule in class RegexURLFilterBase
Parameters:
sign - of the regular expression. A true value means that any URL matching this rule must be included, whereas a false value means that any URL matching this rule must be excluded.
regex - is the regular expression associated to this rule.

main

public static void main(String[] args)
                 throws IOException
Throws:
IOException


Copyright © 2006 The Apache Software Foundation