org.apache.nutch.protocol.file
Class File
java.lang.Object
org.apache.nutch.protocol.file.File
- All Implemented Interfaces:
- Configurable, FieldPluggable, Pluggable, Protocol
public class File
- extends Object
- implements Protocol
File.java deals with file: scheme.
Configurable parameters are defined under "FILE properties" section in
./conf/nutch-default.xml or similar.
- Author:
- John Xing
Field Summary |
static org.slf4j.Logger |
LOG
|
Constructor Summary |
File()
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.slf4j.Logger LOG
File
public File()
setMaxContentLength
public void setMaxContentLength(int length)
- Set the point at which content is truncated.
getProtocolOutput
public ProtocolOutput getProtocolOutput(String url,
WebPage page)
- Description copied from interface:
Protocol
- Returns the
Content
for a fetchlist entry.
- Specified by:
getProtocolOutput
in interface Protocol
getRobotRules
public RobotRules getRobotRules(String url,
WebPage page)
- Description copied from interface:
Protocol
- Retrieve robot rules applicable for this url.
- Specified by:
getRobotRules
in interface Protocol
- Parameters:
url
- url to check
- Returns:
- robot rules (specific for this url or default), never null
getFields
public Collection<WebPage.Field> getFields()
- Specified by:
getFields
in interface FieldPluggable
main
public static void main(String[] args)
throws Exception
- For debugging.
- Throws:
Exception
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interface Configurable
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
Copyright © 2012 The Apache Software Foundation