Class HttpFetcher

java.lang.Object
org.apache.tika.pipes.fetcher.AbstractFetcher
org.apache.tika.pipes.fetcher.http.HttpFetcher
All Implemented Interfaces:
Initializable, Fetcher, RangeFetcher

public class HttpFetcher extends AbstractFetcher implements Initializable, RangeFetcher
Based on Apache httpclient
  • Field Details

    • HTTP_HEADER_PREFIX

      public static String HTTP_HEADER_PREFIX
    • HTTP_FETCH_PREFIX

      public static String HTTP_FETCH_PREFIX
    • HTTP_STATUS_CODE

      public static Property HTTP_STATUS_CODE
      http status code
    • HTTP_NUM_REDIRECTS

      public static Property HTTP_NUM_REDIRECTS
      Number of redirects
    • HTTP_TARGET_URL

      public static Property HTTP_TARGET_URL
      If there were redirects, this captures the final URL visited
    • HTTP_TARGET_IP_ADDRESS

      public static Property HTTP_TARGET_IP_ADDRESS
    • HTTP_FETCH_TRUNCATED

      public static Property HTTP_FETCH_TRUNCATED
    • HTTP_CONTENT_ENCODING

      public static Property HTTP_CONTENT_ENCODING
    • HTTP_CONTENT_TYPE

      public static Property HTTP_CONTENT_TYPE
  • Constructor Details

    • HttpFetcher

      public HttpFetcher()
  • Method Details

    • fetch

      public InputStream fetch(String fetchKey, Metadata metadata) throws IOException, TikaException
      Specified by:
      fetch in interface Fetcher
      Throws:
      IOException
      TikaException
    • fetch

      public InputStream fetch(String fetchKey, long startRange, long endRange, Metadata metadata) throws IOException
      Specified by:
      fetch in interface RangeFetcher
      Throws:
      IOException
    • setUserName

      @Field public void setUserName(String userName)
    • setPassword

      @Field public void setPassword(String password)
    • setNtDomain

      @Field public void setNtDomain(String domain)
    • setAuthScheme

      @Field public void setAuthScheme(String authScheme)
    • setProxyHost

      @Field public void setProxyHost(String proxyHost)
    • setProxyPort

      @Field public void setProxyPort(int proxyPort)
    • setConnectTimeout

      @Field public void setConnectTimeout(int connectTimeout)
    • setRequestTimeout

      @Field public void setRequestTimeout(int requestTimeout)
    • setSocketTimeout

      @Field public void setSocketTimeout(int socketTimeout)
    • setMaxConnections

      @Field public void setMaxConnections(int maxConnections)
    • setMaxConnectionsPerRoute

      @Field public void setMaxConnectionsPerRoute(int maxConnectionsPerRoute)
    • setMaxSpoolSize

      @Field public void setMaxSpoolSize(long maxSpoolSize)
      Set the maximum number of bytes to spool to a temp file. If this value is -1, the full stream will be spooled to a temp file Default size is -1.
      Parameters:
      maxSpoolSize -
    • setMaxRedirects

      @Field public void setMaxRedirects(int maxRedirects)
    • setHttpHeaders

      @Field public void setHttpHeaders(List<String> headers)
      Which http headers should we capture in the metadata. Keys will be prepended with HTTP_HEADER_PREFIX
      Parameters:
      headers -
    • setOverallTimeout

      @Field public void setOverallTimeout(long overallTimeout)
      This sets an overall timeout on the request. If a server is super slow or the file is very long, the other timeouts might not be triggered.
      Parameters:
      overallTimeout -
    • setMaxErrMsgSize

      @Field public void setMaxErrMsgSize(int maxErrMsgSize)
    • setUserAgent

      @Field public void setUserAgent(String userAgent)
      When making the request, what User-Agent is sent in the request. By default httpclient adds e.g. "Apache-HttpClient/4.5.13 (Java/x.y.z)"
      Parameters:
      userAgent -
    • initialize

      public void initialize(Map<String,Param> params) throws TikaConfigException
      Specified by:
      initialize in interface Initializable
      Parameters:
      params - params to use for initialization
      Throws:
      TikaConfigException
    • checkInitialization

      public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException
      Specified by:
      checkInitialization in interface Initializable
      Parameters:
      problemHandler - if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.
      Throws:
      TikaConfigException