public class Http extends HttpBase
This class is a protocol plugin that configures an HTTP client for Basic, Digest and NTLM authentication schemes for web server as well as proxy server. It takes care of HTTPS protocol as well as cookies in a single fetch session.
Documentation can be found on the Nutch HttpAuthenticationSchemes wiki page.
The original description of the motivation to support HttpPostAuthentication is also included on the Nutch wiki. Additionally HttpPostAuthentication development is documented at the NUTCH-827 Jira issue.
Modifier and Type | Field and Description |
---|---|
static org.slf4j.Logger |
LOG |
accept, acceptLanguage, BUFFER_SIZE, enableIfModifiedsinceHeader, maxContent, maxCrawlDelay, proxyHost, proxyPort, RESPONSE_TIME, responseTime, timeout, tlsPreferredCipherSuites, tlsPreferredProtocols, useHttp11, useProxy, userAgent
CHECK_BLOCKING, CHECK_ROBOTS, X_POINT_ID
Constructor and Description |
---|
Http()
Constructs this plugin.
|
Modifier and Type | Method and Description |
---|---|
protected Response |
getResponse(URL url,
CrawlDatum datum,
boolean redirect)
Fetches the
url with a configured HTTP client and gets the
response. |
static void |
main(String[] args)
Main method.
|
void |
setConf(Configuration conf)
Reads the configuration from the Nutch configuration files and sets the
configuration.
|
getAccept, getAcceptLanguage, getConf, getMaxContent, getProtocolOutput, getProxyHost, getProxyPort, getRobotRules, getTimeout, getTlsPreferredCipherSuites, getTlsPreferredProtocols, getUseHttp11, getUserAgent, isIfModifiedSinceEnabled, logConf, main, processDeflateEncoded, processGzipEncoded, useProxy
public void setConf(Configuration conf)
setConf
in interface Configurable
setConf
in class HttpBase
conf
- Configurationpublic static void main(String[] args) throws Exception
args
- Command line argumentsException
protected Response getResponse(URL url, CrawlDatum datum, boolean redirect) throws ProtocolException, IOException
url
with a configured HTTP client and gets the
response.getResponse
in class HttpBase
url
- URL to be fetcheddatum
- Crawl dataredirect
- Follow redirects if and only if trueProtocolException
IOException
Copyright © 2015 The Apache Software Foundation