|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.nutch.urlfilter.api.RegexURLFilterBase
public abstract class RegexURLFilterBase
Generic URL filter
based on
regular expressions.
The regular expressions rules are expressed in a file. The file of rules
is provided by each implementation using the
getRulesFile(Configuration)
method.
The format of this file is made of many rules (one per line):
[+-]<regex>
where plus (+
)means go ahead and index it and minus
(-
)means no.
Field Summary |
---|
Fields inherited from interface org.apache.nutch.net.URLFilter |
---|
X_POINT_ID |
Constructor Summary | |
---|---|
|
RegexURLFilterBase()
Constructs a new empty RegexURLFilterBase |
protected |
RegexURLFilterBase(Reader reader)
Constructs a new RegexURLFilter and init it with a Reader of rules. |
|
RegexURLFilterBase(String filename)
Constructs a new RegexURLFilter and init it with a file of rules. |
Method Summary | |
---|---|
protected abstract RegexRule |
createRule(boolean sign,
String regex)
Creates a new RegexRule . |
String |
filter(String url)
|
Configuration |
getConf()
|
protected abstract String |
getRulesFile(Configuration conf)
Returns the name of the file of rules to use for a particular implementation. |
static void |
main(RegexURLFilterBase filter,
String[] args)
Filter the standard input using a RegexURLFilterBase. |
void |
setConf(Configuration conf)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public RegexURLFilterBase()
public RegexURLFilterBase(String filename) throws IOException, IllegalArgumentException
filename
- is the name of rules file.
IOException
IllegalArgumentException
protected RegexURLFilterBase(Reader reader) throws IOException, IllegalArgumentException
reader
- is a reader of rules.
IOException
IllegalArgumentException
Method Detail |
---|
protected abstract RegexRule createRule(boolean sign, String regex)
RegexRule
.
sign
- of the regular expression.
A true
value means that any URL matching this rule
must be included, whereas a false
value means that any URL matching this rule must be excluded.regex
- is the regular expression associated to this rule.protected abstract String getRulesFile(Configuration conf)
conf
- is the current configuration.
public String filter(String url)
filter
in interface URLFilter
public void setConf(Configuration conf)
setConf
in interface Configurable
public Configuration getConf()
getConf
in interface Configurable
public static void main(RegexURLFilterBase filter, String[] args) throws IOException, IllegalArgumentException
filter
- is the RegexURLFilterBase to use for filtering the
standard input.args
- some optional parameters (not used).
IOException
IllegalArgumentException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |