Package | Description |
---|---|
org.apache.nutch.protocol | |
org.apache.nutch.protocol.file |
Protocol plugin which supports retrieving local file resources.
|
org.apache.nutch.protocol.ftp |
Protocol plugin which supports retrieving documents via the ftp protocol.
|
org.apache.nutch.protocol.http |
Protocol plugin which supports retrieving documents via the http protocol.
|
org.apache.nutch.protocol.http.api |
Common API used by HTTP plugins (
http ,
httpclient ) |
Modifier and Type | Method and Description |
---|---|
Protocol |
ProtocolFactory.getProtocol(String urlString)
Returns the appropriate
Protocol implementation for a url. |
Modifier and Type | Method and Description |
---|---|
crawlercommons.robots.BaseRobotRules |
RobotRulesParser.getRobotRulesSet(Protocol protocol,
org.apache.hadoop.io.Text url) |
abstract crawlercommons.robots.BaseRobotRules |
RobotRulesParser.getRobotRulesSet(Protocol protocol,
URL url) |
Modifier and Type | Class and Description |
---|---|
class |
File
This class is a protocol plugin used for file: scheme.
|
Modifier and Type | Class and Description |
---|---|
class |
Ftp
This class is a protocol plugin used for ftp: scheme.
|
Modifier and Type | Method and Description |
---|---|
crawlercommons.robots.BaseRobotRules |
FtpRobotRulesParser.getRobotRulesSet(Protocol ftp,
URL url)
The hosts for which the caching of robots rules is yet to be done,
it sends a Ftp request to the host corresponding to the
URL
passed, gets robots file, parses the rules and caches the rules object
to avoid re-work in future. |
Modifier and Type | Class and Description |
---|---|
class |
Http |
Modifier and Type | Class and Description |
---|---|
class |
HttpBase |
Modifier and Type | Method and Description |
---|---|
crawlercommons.robots.BaseRobotRules |
HttpRobotRulesParser.getRobotRulesSet(Protocol http,
URL url)
The hosts for which the caching of robots rules is yet to be done,
it sends a Http request to the host corresponding to the
URL
passed, gets robots file, parses the rules and caches the rules object
to avoid re-work in future. |
Copyright © 2014 The Apache Software Foundation