|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.nutch.protocol.ftp.Ftp
public class Ftp
This class is a protocol plugin used for ftp: scheme.
It creates FtpResponse
object and gets the content of the url from it.
Configurable parameters are ftp.username
, ftp.password
,
ftp.content.limit
, ftp.timeout
,
ftp.server.timeout
, ftp.password
,
ftp.keep.connection
and ftp.follow.talk
.
For details see "FTP properties" section in nutch-default.xml
.
Field Summary | |
---|---|
static org.slf4j.Logger |
LOG
|
Fields inherited from interface org.apache.nutch.protocol.Protocol |
---|
CHECK_BLOCKING, CHECK_ROBOTS, X_POINT_ID |
Constructor Summary | |
---|---|
Ftp()
|
Method Summary | |
---|---|
protected void |
finalize()
|
org.apache.hadoop.conf.Configuration |
getConf()
Get the Configuration object |
Collection<WebPage.Field> |
getFields()
|
ProtocolOutput |
getProtocolOutput(String url,
WebPage page)
Creates a FtpResponse object corresponding to the url and
returns a ProtocolOutput object as per the content received |
crawlercommons.robots.BaseRobotRules |
getRobotRules(String url,
WebPage page)
Get the robots rules for a given url |
static void |
main(String[] args)
For debugging. |
void |
setConf(org.apache.hadoop.conf.Configuration conf)
Set the Configuration object |
void |
setFollowTalk(boolean followTalk)
Set followTalk |
void |
setKeepConnection(boolean keepConnection)
Set keepConnection |
void |
setMaxContentLength(int length)
Set the point at which content is truncated. |
void |
setTimeout(int to)
Set the timeout. |
Methods inherited from class java.lang.Object |
---|
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final org.slf4j.Logger LOG
Constructor Detail |
---|
public Ftp()
Method Detail |
---|
public void setTimeout(int to)
public void setMaxContentLength(int length)
public void setFollowTalk(boolean followTalk)
public void setKeepConnection(boolean keepConnection)
public ProtocolOutput getProtocolOutput(String url, WebPage page)
FtpResponse
object corresponding to the url and
returns a ProtocolOutput
object as per the content received
getProtocolOutput
in interface Protocol
url
- Text containing the ftp urldatum
- The CrawlDatum object corresponding to the url
ProtocolOutput
object for the urlprotected void finalize()
finalize
in class Object
public void setConf(org.apache.hadoop.conf.Configuration conf)
Configuration
object
setConf
in interface org.apache.hadoop.conf.Configurable
public org.apache.hadoop.conf.Configuration getConf()
Configuration
object
getConf
in interface org.apache.hadoop.conf.Configurable
public static void main(String[] args) throws Exception
Exception
public Collection<WebPage.Field> getFields()
getFields
in interface FieldPluggable
public crawlercommons.robots.BaseRobotRules getRobotRules(String url, WebPage page)
getRobotRules
in interface Protocol
url
- url to check
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |