|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.conf.Configured org.apache.nutch.collection.Subcollection
public class Subcollection
SubCollection represents a subset of index, you can define url patterns that will indicate that particular page (url) is part of SubCollection.
Field Summary | |
---|---|
static String |
TAG_BLACKLIST
|
static String |
TAG_COLLECTION
|
static String |
TAG_COLLECTIONS
|
static String |
TAG_ID
|
static String |
TAG_KEY
|
static String |
TAG_NAME
|
static String |
TAG_WHITELIST
|
Fields inherited from interface org.apache.nutch.net.URLFilter |
---|
X_POINT_ID |
Constructor Summary | |
---|---|
Subcollection(Configuration conf)
|
|
Subcollection(String id,
String name,
Configuration conf)
public Constructor |
|
Subcollection(String id,
String name,
String key,
Configuration conf)
public Constructor |
Method Summary | |
---|---|
String |
filter(String urlString)
Simple "indexOf" currentFilter for matching patterns. |
String |
getBlackListString()
Returns blacklist String |
String |
getId()
|
String |
getKey()
|
String |
getName()
|
ArrayList |
getWhiteList()
Returns whitelist |
String |
getWhiteListString()
Returns whitelist String |
void |
initialize(Element collection)
Initialize Subcollection from dom element |
protected void |
parseList(ArrayList list,
String text)
Create a list of patterns from chunk of text, patterns are separated with newline |
void |
setBlackList(String list)
Set contents of blacklist from String |
void |
setWhiteList(ArrayList whiteList)
|
void |
setWhiteList(String list)
Set contents of whitelist from String |
Methods inherited from class org.apache.hadoop.conf.Configured |
---|
getConf, setConf |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.hadoop.conf.Configurable |
---|
getConf, setConf |
Field Detail |
---|
public static final String TAG_COLLECTIONS
public static final String TAG_COLLECTION
public static final String TAG_WHITELIST
public static final String TAG_BLACKLIST
public static final String TAG_NAME
public static final String TAG_KEY
public static final String TAG_ID
Constructor Detail |
---|
public Subcollection(String id, String name, Configuration conf)
id
- id of SubCollectionname
- name of SubCollectionpublic Subcollection(String id, String name, String key, Configuration conf)
id
- id of SubCollectionname
- name of SubCollectionpublic Subcollection(Configuration conf)
Method Detail |
---|
public String getName()
public String getKey()
public String getId()
public ArrayList getWhiteList()
public String getWhiteListString()
public String getBlackListString()
public void setWhiteList(ArrayList whiteList)
whiteList
- The whiteList to set.public String filter(String urlString)
rules for evaluation are as follows: 1. if pattern matches in blacklist then url is rejected 2. if pattern matches in whitelist then url is allowed 3. url is rejected
filter
in interface URLFilter
URLFilter.filter(java.lang.String)
public void initialize(Element collection)
collection
- protected void parseList(ArrayList list, String text)
list
- text
- public void setBlackList(String list)
list
- the blacklist contentspublic void setWhiteList(String list)
list
- the whitelist contents
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |