public abstract class RegexURLFilterBase extends Object implements URLFilter
URL filter
based on
regular expressions.
The regular expressions rules are expressed in a file. The file of rules
is determined for each implementation using the
getRulesReader(Configuration conf)
method.
The format of this file is made of many rules (one per line):
[+-]<regex>
where plus (+
)means go ahead and index it and minus
(-
)means no.
X_POINT_ID
Modifier | Constructor and Description |
---|---|
|
RegexURLFilterBase()
Constructs a new empty RegexURLFilterBase
|
|
RegexURLFilterBase(File filename)
Constructs a new RegexURLFilter and init it with a file of rules.
|
protected |
RegexURLFilterBase(Reader reader)
Constructs a new RegexURLFilter and init it with a Reader of rules.
|
|
RegexURLFilterBase(String rules)
Constructs a new RegexURLFilter and inits it with a list of rules.
|
Modifier and Type | Method and Description |
---|---|
protected abstract RegexRule |
createRule(boolean sign,
String regex)
Creates a new
RegexRule . |
String |
filter(String url) |
org.apache.hadoop.conf.Configuration |
getConf() |
protected abstract Reader |
getRulesReader(org.apache.hadoop.conf.Configuration conf)
Returns the name of the file of rules to use for
a particular implementation.
|
static void |
main(RegexURLFilterBase filter,
String[] args)
Filter the standard input using a RegexURLFilterBase.
|
void |
setConf(org.apache.hadoop.conf.Configuration conf) |
public RegexURLFilterBase()
public RegexURLFilterBase(File filename) throws IOException, IllegalArgumentException
filename
- is the name of rules file.IOException
IllegalArgumentException
public RegexURLFilterBase(String rules) throws IOException, IllegalArgumentException
rules
- string with a list of rules, one rule per lineIOException
IllegalArgumentException
protected RegexURLFilterBase(Reader reader) throws IOException, IllegalArgumentException
reader
- is a reader of rules.IOException
IllegalArgumentException
protected abstract RegexRule createRule(boolean sign, String regex)
RegexRule
.sign
- of the regular expression.
A true
value means that any URL matching this rule
must be included, whereas a false
value means that any URL matching this rule must be excluded.regex
- is the regular expression associated to this rule.protected abstract Reader getRulesReader(org.apache.hadoop.conf.Configuration conf) throws IOException
conf
- is the current configuration.IOException
public void setConf(org.apache.hadoop.conf.Configuration conf)
setConf
in interface org.apache.hadoop.conf.Configurable
public org.apache.hadoop.conf.Configuration getConf()
getConf
in interface org.apache.hadoop.conf.Configurable
public static void main(RegexURLFilterBase filter, String[] args) throws IOException, IllegalArgumentException
filter
- is the RegexURLFilterBase to use for filtering the
standard input.args
- some optional parameters (not used).IOException
IllegalArgumentException
Copyright © 2014 The Apache Software Foundation