Interface Protocol

All Superinterfaces:
org.apache.hadoop.conf.Configurable, FieldPluggable, Pluggable
All Known Implementing Classes:
File, Ftp, Http, Http, HttpBase, Sftp

public interface Protocol
extends FieldPluggable, org.apache.hadoop.conf.Configurable

A retriever of url content. Implemented by protocol extensions.

Field Summary
static String CHECK_BLOCKING
          Property name.
static String CHECK_ROBOTS
          Property name.
static String X_POINT_ID
          The name of the extension point.
Method Summary
 ProtocolOutput getProtocolOutput(String url, WebPage page)
 crawlercommons.robots.BaseRobotRules getRobotRules(String url, WebPage page)
          Retrieve robot rules applicable for this url.
Methods inherited from interface org.apache.nutch.plugin.FieldPluggable
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf

Field Detail


static final String X_POINT_ID
The name of the extension point.


static final String CHECK_BLOCKING
Property name. If in the current configuration this property is set to true, protocol implementations should handle "politeness" limits internally. If this is set to false, it is assumed that these limits are enforced elsewhere, and protocol implementations should not enforce them internally.

See Also:
Constant Field Values


static final String CHECK_ROBOTS
Property name. If in the current configuration this property is set to true, protocol implementations should handle robot exclusion rules internally. If this is set to false, it is assumed that these limits are enforced elsewhere, and protocol implementations should not enforce them internally.

See Also:
Constant Field Values
Method Detail


ProtocolOutput getProtocolOutput(String url,
                                 WebPage page)


crawlercommons.robots.BaseRobotRules getRobotRules(String url,
                                                   WebPage page)
Retrieve robot rules applicable for this url.

url - url to check
page -
robot rules (specific for this url or default), never null

Copyright © 2013 The Apache Software Foundation