Class StandardWriteFilter

java.lang.Object
org.apache.tika.metadata.writefilter.StandardWriteFilter
All Implemented Interfaces:
Serializable, MetadataWriteFilter

public class StandardWriteFilter extends Object implements MetadataWriteFilter, Serializable
This is to be used to limit the amount of metadata that a parser can add based on the maxTotalEstimatedSize, maxFieldSize, maxValuesPerField, and maxKeySize. This can also be used to limit which fields are stored in the metadata object at write-time with includeFields. All sizes are measured in UTF-16 bytes. The size is estimated as a rough order of magnitude of what is required to store the string in memory in Java. We recognize that Java uses more bytes to store length, offset etc. for strings. But the extra overhead varies by Java version and implementation, and we just need a basic estimate. We also recognize actual memory usage is affected by interning strings, etc. Please forgive us ... or consider writing your own write filter. :) NOTE: Fields in ALWAYS_SET_FIELDS are always set no matter the current state of maxTotalEstimatedSize. Except for TikaCoreProperties.TIKA_CONTENT, they are truncated at maxFieldSize, and their sizes contribute to the maxTotalEstimatedSize. NOTE: Fields in ALWAYS_ADD_FIELDS are always added no matter the current state of maxTotalEstimatedSize. Except for TikaCoreProperties.TIKA_CONTENT, each addition is truncated at maxFieldSize, and their sizes contribute to the maxTotalEstimatedSize. This class minimumMaxFieldSizeInAlwaysFields to protect the ALWAYS_ADD_FIELDS and ALWAYS_SET_FIELDS. If we didn't have this and a user sets the maxFieldSize to, say, 10 bytes, the internal parser behavior would be broken because parsers rely on HttpHeaders.CONTENT_TYPE to determine which parser to call. NOTE: as with Metadata, this object is not thread safe.
See Also:
  • Field Details

    • ALWAYS_SET_FIELDS

      public static final Set<String> ALWAYS_SET_FIELDS
    • ALWAYS_ADD_FIELDS

      public static final Set<String> ALWAYS_ADD_FIELDS
  • Constructor Details

    • StandardWriteFilter

      protected StandardWriteFilter(int maxKeySize, int maxFieldSize, int maxEstimatedSize, int maxValuesPerField, Set<String> includeFields, boolean includeEmpty)
      Parameters:
      maxKeySize - maximum key size in UTF-16 bytes-- keys will be truncated to this length; if less than 0, keys will not be truncated
      maxEstimatedSize -
      includeFields - if null or empty, all fields are included; otherwise, which fields to add to the metadata object.
      includeEmpty - if true, this will set or add an empty value to the metadata object.
  • Method Details

    • filterExisting

      public void filterExisting(Map<String,String[]> data)
      Specified by:
      filterExisting in interface MetadataWriteFilter
    • set

      public void set(String field, String value, Map<String,String[]> data)
      Description copied from interface: MetadataWriteFilter
      Based on the field and the value, this filter modifies the field and/or the value to something that should be set in the Metadata object.
      Specified by:
      set in interface MetadataWriteFilter
    • add

      public void add(String field, String value, Map<String,String[]> data)
      Description copied from interface: MetadataWriteFilter
      Based on the field and value, this filter modifies the field and/or the value to something that should be added to the Metadata object. If the value is null, no value is set or added. Status updates (e.g. write limit reached) can be added directly to the underlying metadata.
      Specified by:
      add in interface MetadataWriteFilter