Class S3Emitter

java.lang.Object
org.apache.tika.pipes.emitter.AbstractEmitter
org.apache.tika.pipes.emitter.s3.S3Emitter
All Implemented Interfaces:
Initializable, Emitter, StreamEmitter

public class S3Emitter extends AbstractEmitter implements Initializable, StreamEmitter
Emits to existing s3 bucket
  <properties>
      <emitters>
          <emitter class="org.apache.tika.pipes.emitter.s3.S3Emitter>
              <params>
                  <!-- required -->
                  <param name="name" type="string">s3e</param>
                  <!-- required -->
                  <param name="region" type="string">us-east-1</param>
                  <!-- required -->
                  <param name="credentialsProvider"
                       type="string">(profile|instance)</param>
                  <!-- required if credentialsProvider=profile-->
                  <param name="profile" type="string">my-profile</param>
                  <!-- required -->
                  <param name="bucket" type="string">my-bucket</param>
                  <!-- optional; prefix to add to the path before emitting;
                       default is no prefix -->
                  <param name="prefix" type="string">my-prefix</param>
                  <!-- optional; default is 'json' this will be added to the SOURCE_PATH
                                    if no emitter key is specified. Do not add a "."
                                     before the extension -->
                  <param name="fileExtension" type="string">json</param>
                  <!-- optional; default is 'true'-- whether to copy the
                     json to a local file before putting to s3 -->
                  <param name="spoolToTemp" type="bool">true</param>
              </params>
          </emitter>
      </emitters>
  </properties>
  • Constructor Details

    • S3Emitter

      public S3Emitter()
  • Method Details

    • emit

      public void emit(String emitKey, List<Metadata> metadataList) throws IOException, TikaEmitterException
      Requires the src-bucket/path/to/my/file.txt in the TikaCoreProperties.SOURCE_PATH.
      Specified by:
      emit in interface Emitter
      Parameters:
      metadataList -
      Throws:
      IOException
      TikaException
      TikaEmitterException
    • emit

      public void emit(String path, InputStream is, Metadata userMetadata) throws IOException, TikaEmitterException
      Specified by:
      emit in interface StreamEmitter
      Parameters:
      path - -- object path, not including the bucket
      is - inputStream to copy
      userMetadata - this will be written to the s3 ObjectMetadata's userMetadata
      Throws:
      TikaEmitterException - or IOexception if there is a Runtime s3 client exception
      IOException
    • setSpoolToTemp

      @Field public void setSpoolToTemp(boolean spoolToTemp)
      Whether or not to spool the metadatalist to a tmp file before putting object. Default: true. If this is set to false, this emitter writes the json object to memory and then puts that into s3.
      Parameters:
      spoolToTemp -
    • setRegion

      @Field public void setRegion(String region)
    • setProfile

      @Field public void setProfile(String profile)
    • setBucket

      @Field public void setBucket(String bucket)
    • setPrefix

      @Field public void setPrefix(String prefix)
    • setCredentialsProvider

      @Field public void setCredentialsProvider(String credentialsProvider)
    • setFileExtension

      @Field public void setFileExtension(String fileExtension)
      If you want to customize the output file's file extension. Do not include the "."
      Parameters:
      fileExtension -
    • setAccessKey

      @Field public void setAccessKey(String accessKey)
    • setSecretKey

      @Field public void setSecretKey(String secretKey)
    • setMaxConnections

      @Field public void setMaxConnections(int maxConnections)
      maximum number of http connections allowed. This should be greater than or equal to the number of threads emitting to S3.
      Parameters:
      maxConnections -
    • setEndpointConfigurationService

      @Field public void setEndpointConfigurationService(String endpointConfigurationService)
    • initialize

      public void initialize(Map<String,Param> params) throws TikaConfigException
      This initializes the s3 client. Note, we wrap S3's RuntimeExceptions, e.g. AmazonClientException in a TikaConfigException.
      Specified by:
      initialize in interface Initializable
      Parameters:
      params - params to use for initialization
      Throws:
      TikaConfigException
    • checkInitialization

      public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException
      Specified by:
      checkInitialization in interface Initializable
      Parameters:
      problemHandler - if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.
      Throws:
      TikaConfigException
    • setPathStyleAccessEnabled

      @Field public void setPathStyleAccessEnabled(boolean pathStyleAccessEnabled)