org.apache.nutch.protocol.ftp
Class FtpRobotRulesParser
java.lang.Object
org.apache.nutch.protocol.RobotRulesParser
org.apache.nutch.protocol.ftp.FtpRobotRulesParser
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable
public class FtpRobotRulesParser
- extends RobotRulesParser
This class is used for parsing robots for urls belonging to FTP protocol.
It extends the generic RobotRulesParser
class and contains
Ftp protocol specific implementation for obtaining the robots file.
Field Summary |
static org.slf4j.Logger |
LOG
|
Method Summary |
crawlercommons.robots.BaseRobotRules |
getRobotRulesSet(Protocol ftp,
URL url)
The hosts for which the caching of robots rules is yet to be done,
it sends a Ftp request to the host corresponding to the URL
passed, gets robots file, parses the rules and caches the rules object
to avoid re-work in future. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.slf4j.Logger LOG
FtpRobotRulesParser
public FtpRobotRulesParser(org.apache.hadoop.conf.Configuration conf)
getRobotRulesSet
public crawlercommons.robots.BaseRobotRules getRobotRulesSet(Protocol ftp,
URL url)
- The hosts for which the caching of robots rules is yet to be done,
it sends a Ftp request to the host corresponding to the
URL
passed, gets robots file, parses the rules and caches the rules object
to avoid re-work in future.
- Specified by:
getRobotRulesSet
in class RobotRulesParser
- Parameters:
ftp
- The Protocol
objecturl
- URL
- Returns:
- robotRules A
BaseRobotRules
object for the rules
Copyright © 2013 The Apache Software Foundation