org.apache.nutch.protocol.file
Class File
java.lang.Object
org.apache.nutch.protocol.file.File
- All Implemented Interfaces:
- Configurable, Pluggable, Protocol
public class File
- extends Object
- implements Protocol
File.java deals with file: scheme.
Configurable parameters are defined under "FILE properties" section
in ./conf/nutch-default.xml or similar.
- Author:
- John Xing
Field Summary |
static org.apache.commons.logging.Log |
LOG
|
Constructor Summary |
File()
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.apache.commons.logging.Log LOG
File
public File()
setMaxContentLength
public void setMaxContentLength(int length)
- Set the point at which content is truncated.
getProtocolOutput
public ProtocolOutput getProtocolOutput(Text url,
CrawlDatum datum)
- Description copied from interface:
Protocol
- Returns the
Content
for a fetchlist entry.
- Specified by:
getProtocolOutput
in interface Protocol
main
public static void main(String[] args)
throws Exception
- For debugging.
- Throws:
Exception
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interface Configurable
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
getRobotRules
public RobotRules getRobotRules(Text url,
CrawlDatum datum)
- Description copied from interface:
Protocol
- Retrieve robot rules applicable for this url.
- Specified by:
getRobotRules
in interface Protocol
- Parameters:
url
- url to checkdatum
- page datum
- Returns:
- robot rules (specific for this url or default), never null
Copyright © 2006 The Apache Software Foundation