|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.nutch.protocol.ftp.Ftp
public class Ftp
Ftp.java deals with ftp: scheme. Configurable parameters are defined under "FTP properties" section in ./conf/nutch-default.xml or similar.
Field Summary | |
---|---|
static org.slf4j.Logger |
LOG
|
Fields inherited from interface org.apache.nutch.protocol.Protocol |
---|
CHECK_BLOCKING, CHECK_ROBOTS, X_POINT_ID |
Constructor Summary | |
---|---|
Ftp()
|
Method Summary | |
---|---|
protected void |
finalize()
|
Configuration |
getConf()
|
ProtocolOutput |
getProtocolOutput(Text url,
CrawlDatum datum)
Returns the Content for a fetchlist entry. |
RobotRules |
getRobotRules(Text url,
CrawlDatum datum)
Retrieve robot rules applicable for this url. |
static void |
main(String[] args)
For debugging. |
void |
setConf(Configuration conf)
|
void |
setFollowTalk(boolean followTalk)
Set followTalk |
void |
setKeepConnection(boolean keepConnection)
Set keepConnection |
void |
setMaxContentLength(int length)
Set the point at which content is truncated. |
void |
setTimeout(int to)
Set the timeout. |
Methods inherited from class java.lang.Object |
---|
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final org.slf4j.Logger LOG
Constructor Detail |
---|
public Ftp()
Method Detail |
---|
public void setTimeout(int to)
public void setMaxContentLength(int length)
public void setFollowTalk(boolean followTalk)
public void setKeepConnection(boolean keepConnection)
public ProtocolOutput getProtocolOutput(Text url, CrawlDatum datum)
Protocol
Content
for a fetchlist entry.
getProtocolOutput
in interface Protocol
protected void finalize()
finalize
in class Object
public static void main(String[] args) throws Exception
Exception
public void setConf(Configuration conf)
setConf
in interface Configurable
public Configuration getConf()
getConf
in interface Configurable
public RobotRules getRobotRules(Text url, CrawlDatum datum)
Protocol
getRobotRules
in interface Protocol
url
- url to checkdatum
- page datum
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |