Class StandardWriteFilter
java.lang.Object
org.apache.tika.metadata.writefilter.StandardWriteFilter
- All Implemented Interfaces:
Serializable
,MetadataWriteFilter
This is to be used to limit the amount of metadata that a
parser can add based on the
maxTotalEstimatedSize
,
maxFieldSize
, maxValuesPerField
, and
maxKeySize
. This can also be used to limit which
fields are stored in the metadata object at write-time
with includeFields
.
All sizes are measured in UTF-16 bytes. The size is estimated
as a rough order of magnitude of what is
required to store the string in memory in Java. We recognize
that Java uses more bytes to store length, offset etc. for strings. But
the extra overhead varies by Java version and implementation,
and we just need a basic estimate. We also recognize actual
memory usage is affected by interning strings, etc.
Please forgive us ... or consider writing your own write filter. :)
NOTE: Fields in ALWAYS_SET_FIELDS
are
always set no matter the current state of maxTotalEstimatedSize
.
Except for TikaCoreProperties.TIKA_CONTENT
, they are truncated at
maxFieldSize
, and their sizes contribute to the maxTotalEstimatedSize
.
NOTE: Fields in ALWAYS_ADD_FIELDS
are
always added no matter the current state of maxTotalEstimatedSize
.
Except for TikaCoreProperties.TIKA_CONTENT
, each addition is truncated at
maxFieldSize
, and their sizes contribute to the maxTotalEstimatedSize
.
This class minimumMaxFieldSizeInAlwaysFields
to protect the
ALWAYS_ADD_FIELDS
and ALWAYS_SET_FIELDS
. If we didn't
have this and a user sets the maxFieldSize
to, say, 10 bytes,
the internal parser behavior would be broken because parsers rely on
HttpHeaders.CONTENT_TYPE
to determine which parser to call.
NOTE: as with Metadata
, this object is not thread safe.- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescription -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
StandardWriteFilter
(int maxKeySize, int maxFieldSize, int maxEstimatedSize, int maxValuesPerField, Set<String> includeFields, boolean includeEmpty) -
Method Summary
Modifier and TypeMethodDescriptionvoid
Based on the field and value, this filter modifies the field and/or the value to something that should be added to the Metadata object.void
filterExisting
(Map<String, String[]> data) void
Based on the field and the value, this filter modifies the field and/or the value to something that should be set in the Metadata object.
-
Field Details
-
ALWAYS_SET_FIELDS
-
ALWAYS_ADD_FIELDS
-
-
Constructor Details
-
StandardWriteFilter
protected StandardWriteFilter(int maxKeySize, int maxFieldSize, int maxEstimatedSize, int maxValuesPerField, Set<String> includeFields, boolean includeEmpty) - Parameters:
maxKeySize
- maximum key size in UTF-16 bytes-- keys will be truncated to this length; if less than 0, keys will not be truncatedmaxEstimatedSize
-includeFields
- if null or empty, all fields are included; otherwise, which fields to add to the metadata object.includeEmpty
- iftrue
, this will set or add an empty value to the metadata object.
-
-
Method Details
-
filterExisting
- Specified by:
filterExisting
in interfaceMetadataWriteFilter
-
set
Description copied from interface:MetadataWriteFilter
Based on the field and the value, this filter modifies the field and/or the value to something that should be set in the Metadata object.- Specified by:
set
in interfaceMetadataWriteFilter
-
add
Description copied from interface:MetadataWriteFilter
Based on the field and value, this filter modifies the field and/or the value to something that should be added to the Metadata object. If the value isnull
, no value is set or added. Status updates (e.g. write limit reached) can be added directly to the underlying metadata.- Specified by:
add
in interfaceMetadataWriteFilter
-