Package | Description |
---|---|
org.apache.nutch.collection |
Subcollection is a subset of an index.
|
org.apache.nutch.urlfilter.api | |
org.apache.nutch.urlfilter.automaton |
A url filter plugin based on
dk.brics.automaton Finite-State
Automata for JavaTM.
|
org.apache.nutch.urlfilter.domain |
A url filter plugin that filters by domain.
|
org.apache.nutch.urlfilter.domainblacklist | |
org.apache.nutch.urlfilter.prefix |
A url filter plugin.
|
org.apache.nutch.urlfilter.regex |
A url filter plugin.
|
org.apache.nutch.urlfilter.suffix | |
org.apache.nutch.urlfilter.validator |
A url filter plugin that validates given urls.
|
Modifier and Type | Class and Description |
---|---|
class |
Subcollection
SubCollection represents a subset of index, you can define url patterns that
will indicate that particular page (url) is part of SubCollection.
|
Modifier and Type | Class and Description |
---|---|
class |
RegexURLFilterBase
Generic
URL filter based on
regular expressions. |
Modifier and Type | Class and Description |
---|---|
class |
AutomatonURLFilter
RegexURLFilterBase implementation based on the
dk.brics.automaton
Finite-State Automata for JavaTM.
|
Modifier and Type | Class and Description |
---|---|
class |
DomainURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and
hostnames.
|
Modifier and Type | Class and Description |
---|---|
class |
DomainBlacklistURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and
hostnames.
|
Modifier and Type | Class and Description |
---|---|
class |
PrefixURLFilter
Filters URLs based on a file of URL prefixes.
|
Modifier and Type | Class and Description |
---|---|
class |
RegexURLFilter
Filters URLs based on a file of regular expressions using the
Java Regex implementation . |
Modifier and Type | Class and Description |
---|---|
class |
SuffixURLFilter
Filters URLs based on a file of URL suffixes.
|
Modifier and Type | Class and Description |
---|---|
class |
UrlValidator
Validates URLs.
|
Copyright © 2014 The Apache Software Foundation