public class HttpRobotRulesParser extends RobotRulesParser
RobotRulesParser
class and contains
Http protocol specific implementation for obtaining the robots file.Modifier and Type | Field and Description |
---|---|
protected boolean |
allowForbidden |
static org.slf4j.Logger |
LOG |
agentNames, CACHE, EMPTY_RULES, FORBID_ALL_RULES
Constructor and Description |
---|
HttpRobotRulesParser(org.apache.hadoop.conf.Configuration conf) |
Modifier and Type | Method and Description |
---|---|
crawlercommons.robots.BaseRobotRules |
getRobotRulesSet(Protocol http,
URL url)
The hosts for which the caching of robots rules is yet to be done,
it sends a Http request to the host corresponding to the
URL
passed, gets robots file, parses the rules and caches the rules object
to avoid re-work in future. |
getConf, getRobotRulesSet, main, parseRules, setConf
public static final org.slf4j.Logger LOG
protected boolean allowForbidden
public HttpRobotRulesParser(org.apache.hadoop.conf.Configuration conf)
public crawlercommons.robots.BaseRobotRules getRobotRulesSet(Protocol http, URL url)
URL
passed, gets robots file, parses the rules and caches the rules object
to avoid re-work in future.getRobotRulesSet
in class RobotRulesParser
http
- The Protocol
objecturl
- URLBaseRobotRules
object for the rulesCopyright © 2014 The Apache Software Foundation