org.apache.nutch.protocol.httpclient
Class Http

java.lang.Object
  extended by org.apache.nutch.protocol.http.api.HttpBase
      extended by org.apache.nutch.protocol.httpclient.Http
All Implemented Interfaces:
Configurable, Pluggable, Protocol

public class Http
extends HttpBase

This class is a protocol plugin that configures an HTTP client for Basic, Digest and NTLM authentication schemes for web server as well as proxy server. It takes care of HTTPS protocol as well as cookies in a single fetch session.

Author:
Susam Pal

Field Summary
static org.slf4j.Logger LOG
           
 
Fields inherited from class org.apache.nutch.protocol.http.api.HttpBase
acceptLanguage, BUFFER_SIZE, maxContent, maxCrawlDelay, proxyHost, proxyPort, timeout, useHttp11, useProxy, userAgent
 
Fields inherited from interface org.apache.nutch.protocol.Protocol
CHECK_BLOCKING, CHECK_ROBOTS, X_POINT_ID
 
Constructor Summary
Http()
          Constructs this plugin.
 
Method Summary
protected  Response getResponse(URL url, CrawlDatum datum, boolean redirect)
          Fetches the url with a configured HTTP client and gets the response.
static void main(String[] args)
          Main method.
 void setConf(Configuration conf)
          Reads the configuration from the Nutch configuration files and sets the configuration.
 
Methods inherited from class org.apache.nutch.protocol.http.api.HttpBase
getAcceptLanguage, getConf, getMaxContent, getProtocolOutput, getProxyHost, getProxyPort, getRobotRules, getTimeout, getUseHttp11, getUserAgent, logConf, main, processDeflateEncoded, processGzipEncoded, useProxy
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.slf4j.Logger LOG
Constructor Detail

Http

public Http()
Constructs this plugin.

Method Detail

setConf

public void setConf(Configuration conf)
Reads the configuration from the Nutch configuration files and sets the configuration.

Specified by:
setConf in interface Configurable
Overrides:
setConf in class HttpBase
Parameters:
conf - Configuration

main

public static void main(String[] args)
                 throws Exception
Main method.

Parameters:
args - Command line arguments
Throws:
Exception

getResponse

protected Response getResponse(URL url,
                               CrawlDatum datum,
                               boolean redirect)
                        throws ProtocolException,
                               IOException
Fetches the url with a configured HTTP client and gets the response.

Specified by:
getResponse in class HttpBase
Parameters:
url - URL to be fetched
datum - Crawl data
redirect - Follow redirects if and only if true
Returns:
HTTP response
Throws:
ProtocolException
IOException


Copyright © 2011 The Apache Software Foundation