public class Ftp extends Object implements Protocol
FtpResponse
object and gets the content of the url from it.
Configurable parameters are ftp.username
, ftp.password
,
ftp.content.limit
, ftp.timeout
,
ftp.server.timeout
, ftp.password
,
ftp.keep.connection
and ftp.follow.talk
.
For details see "FTP properties" section in nutch-default.xml
.Modifier and Type | Field and Description |
---|---|
static org.slf4j.Logger |
LOG |
CHECK_BLOCKING, CHECK_ROBOTS, X_POINT_ID
Constructor and Description |
---|
Ftp() |
Modifier and Type | Method and Description |
---|---|
protected void |
finalize() |
int |
getBufferSize() |
org.apache.hadoop.conf.Configuration |
getConf()
Get the
Configuration object |
ProtocolOutput |
getProtocolOutput(org.apache.hadoop.io.Text url,
CrawlDatum datum)
Creates a
FtpResponse object corresponding to the url and
returns a ProtocolOutput object as per the content received |
crawlercommons.robots.BaseRobotRules |
getRobotRules(org.apache.hadoop.io.Text url,
CrawlDatum datum)
Get the robots rules for a given url
|
static void |
main(String[] args)
For debugging.
|
void |
setConf(org.apache.hadoop.conf.Configuration conf)
Set the
Configuration object |
void |
setFollowTalk(boolean followTalk)
Set followTalk
|
void |
setKeepConnection(boolean keepConnection)
Set keepConnection
|
void |
setMaxContentLength(int length)
Set the point at which content is truncated.
|
void |
setTimeout(int to)
Set the timeout.
|
public void setTimeout(int to)
public void setMaxContentLength(int length)
public void setFollowTalk(boolean followTalk)
public void setKeepConnection(boolean keepConnection)
public ProtocolOutput getProtocolOutput(org.apache.hadoop.io.Text url, CrawlDatum datum)
FtpResponse
object corresponding to the url and
returns a ProtocolOutput
object as per the content receivedgetProtocolOutput
in interface Protocol
url
- Text containing the ftp urldatum
- The CrawlDatum object corresponding to the urlProtocolOutput
object for the urlpublic void setConf(org.apache.hadoop.conf.Configuration conf)
Configuration
objectsetConf
in interface org.apache.hadoop.conf.Configurable
public org.apache.hadoop.conf.Configuration getConf()
Configuration
objectgetConf
in interface org.apache.hadoop.conf.Configurable
public crawlercommons.robots.BaseRobotRules getRobotRules(org.apache.hadoop.io.Text url, CrawlDatum datum)
getRobotRules
in interface Protocol
url
- url to checkdatum
- page datumpublic int getBufferSize()
Copyright © 2014 The Apache Software Foundation