org.apache.nutch.protocol.httpclient
Class Http
java.lang.Object
org.apache.nutch.protocol.http.api.HttpBase
org.apache.nutch.protocol.httpclient.Http
- All Implemented Interfaces:
- Configurable, Pluggable, Protocol
public class Http
- extends HttpBase
This class is a protocol plugin that configures an HTTP client for
Basic, Digest and NTLM authentication schemes for web server as well
as proxy server. It takes care of HTTPS protocol as well as cookies
in a single fetch session.
- Author:
- Susam Pal
Field Summary |
static org.slf4j.Logger |
LOG
|
Constructor Summary |
Http()
Constructs this plugin. |
Method Summary |
protected Response |
getResponse(URL url,
CrawlDatum datum,
boolean redirect)
Fetches the url with a configured HTTP client and
gets the response. |
static void |
main(String[] args)
Main method. |
void |
setConf(Configuration conf)
Reads the configuration from the Nutch configuration files and sets
the configuration. |
Methods inherited from class org.apache.nutch.protocol.http.api.HttpBase |
getAcceptLanguage, getConf, getMaxContent, getProtocolOutput, getProxyHost, getProxyPort, getRobotRules, getTimeout, getUseHttp11, getUserAgent, logConf, main, processDeflateEncoded, processGzipEncoded, useProxy |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.slf4j.Logger LOG
Http
public Http()
- Constructs this plugin.
setConf
public void setConf(Configuration conf)
- Reads the configuration from the Nutch configuration files and sets
the configuration.
- Specified by:
setConf
in interface Configurable
- Overrides:
setConf
in class HttpBase
- Parameters:
conf
- Configuration
main
public static void main(String[] args)
throws Exception
- Main method.
- Parameters:
args
- Command line arguments
- Throws:
Exception
getResponse
protected Response getResponse(URL url,
CrawlDatum datum,
boolean redirect)
throws ProtocolException,
IOException
- Fetches the
url
with a configured HTTP client and
gets the response.
- Specified by:
getResponse
in class HttpBase
- Parameters:
url
- URL to be fetcheddatum
- Crawl dataredirect
- Follow redirects if and only if true
- Returns:
- HTTP response
- Throws:
ProtocolException
IOException
Copyright © 2011 The Apache Software Foundation