All Classes (Apache Tika 1.23 API)

All Classes Interface Summary Class Summary Enum Summary Exception Summary Error Summary Annotation Types Summary
Class	Description
AbstractConsumersBuilder
AbstractConverter	Base class for Tika Metadata to XMP converter which provides some needed common functionality.
AbstractEncodingDetectorParser	Abstract base class for parsers that use the AutoDetectReader and need to use the `EncodingDetector` configured by `TikaConfig`
AbstractFSConsumer
AbstractListManager
AbstractOfficeParser	Intermediate layer to set `OfficeParserConfig` uniformly.
AbstractOOXMLExtractor	Base class for all Tika OOXML extractors.
AbstractParser	Abstract base class for new parsers.
AbstractProfiler
AbstractProfiler.EXCEPTION_TYPE
AbstractProfiler.PARSE_ERROR_TYPE	If information was gathered from the log file about a parse error
AbstractRecursiveParserWrapperHandler	This is a special handler to be used only with the `RecursiveParserWrapper`.
AbstractTranslator
AbstractXML2003Parser
AccessChecker	Checks whether or not a document allows extraction generally or extraction for accessibility only.
AccessPermissionException	Exception to be thrown when a document does not allow content extraction.
AccessPermissions	Until we can find a common standard, we'll use these options.
Activator
AdobeFontMetricParser	Parser for AFM Font Files
AdvancedTypeDetector
AgeRecogniser	Parser for extracting features from text.
AgeRecogniserConfig	Stores URL for AgePredictor
AlphaIdeographFilterFactory	Factory for filter that only allows tokens with characters that "isAlphabetic" or "isIdeographic" through.
AnalyzerManager
AnnotationUtils	This class contains utilities for dealing with tika annotations
AppleSingleFileParser	Parser that strips the header off of AppleSingle and AppleDouble files.
AppParserFactoryBuilder
AttributeDependantMetadataHandler	This adds a Metadata entry for a given node.
AttributeMatcher	Final evaluation state of a `.../@*` XPath expression.
AttributeMetadataHandler	SAX event handler that maps the contents of an XML attribute into a metadata field.
AudioFrame	An Audio Frame in an MP3 file.
AudioParser
AutoDetectParser
AutoDetectParserFactory	Simple class for AutoDetectParser
AutoDetectParserFactory	Factory for an AutoDetectParser
AutoDetectReader	An input stream reader that automatically detects the character encoding to be used for converting bytes to characters.
BasicContentHandlerFactory	Basic factory for creating common types of ContentHandlers
BasicContentHandlerFactory.HANDLER_TYPE	Common handler types for content.
BasicTikaFSConsumer	Basic FileResourceConsumer that reads files from an input directory and writes content to the output directory.
BasicTikaFSConsumersBuilder
BasicTokenCountStatsCalculator
BatchNoRestartError	FileResourceConsumers should throw this if something catastrophic has happened and the BatchProcess should shutdown and not be restarted.
BatchProcess	This is the main processor class for a single process.
BatchProcess.BATCH_CONSTANTS
BatchProcessBuilder	Builds a BatchProcessor from a combination of runtime arguments and the config file.
BatchProcessDriverCLI
BatchTopCommonTokenCounter	Utility class that runs TopCommonTokenCounter against a directory of table files (named {lang}_table.gz or leipzip-like afr_...-sentences.txt) and outputs common tokens files for each input table file in the output directory.
BodyContentHandler	Content handler decorator that only passes everything inside the XHTML <body/> tag to the underlying handler.
BoilerpipeContentHandler	Uses the boilerpipe library to automatically extract the main content from a web page.
BouncyCastleDigester	Digester that relies on BouncyCastle for MessageDigest implementations.
BoundedInputStream	Very slight modification of Commons' BoundedInputStream so that we can figure out if this hit the bound or not.
BPGParser	Parser for the Better Portable Graphics )BPG) File Format.
CachedTranslator	CachedTranslator.
CaptionObject	A model for caption objects from graphics and texts typically includes human readable sentence, language of the sentence and confidence score.
Cell	Cell of content.
CellDecorator	Cell decorator.
CharsetDetector	`CharsetDetector` provides a facility for detecting the charset or encoding of character data in an unknown format.
CharsetMatch	This class represents a charset that has been identified by a CharsetDetector as a possible encoding for a set of input data.
CharsetUtils
ChildMatcher	Intermediate evaluation state of a `.../*...` XPath expression.
ChmAccessor<T>	Defines an accessor interface
ChmAssert	Contains chm extractor assertions
ChmBlockInfo	A container that contains chm block information such as: i.
ChmCommons
ChmCommons.EntryType	Represents entry types: uncompressed, compressed
ChmCommons.IntelState	Represents intel file states during decompression
ChmCommons.LzxState	Represents lzx states: started decoding, not started decoding
ChmConstants
ChmDirectoryListingSet	Holds chm listing entries
ChmExtractor	Extracts text from chm file.
ChmItsfHeader	The Header 0000: char[4] 'ITSF' 0004: DWORD 3 (Version number) 0008: DWORD Total header length, including header section table and following data.
ChmItspHeader	Directory header The directory starts with a header; its format is as follows: 0000: char[4] 'ITSP' 0004: DWORD Version number 1 0008: DWORD Length of the directory header 000C: DWORD $0a (unknown) 0010: DWORD $1000 Directory chunk size 0014: DWORD "Density" of quickref section, usually 2 0018: DWORD Depth of the index tree - 1 there is no index, 2 if there is one level of PMGI chunks 001C: DWORD Chunk number of root index chunk, -1 if there is none (though at least one file has 0 despite there being no index chunk, probably a bug) 0020: DWORD Chunk number of first PMGL (listing) chunk 0024: DWORD Chunk number of last PMGL (listing) chunk 0028: DWORD -1 (unknown) 002C: DWORD Number of directory chunks (total) 0030: DWORD Windows language ID 0034: GUID {5D02926A-212E-11D0-9DF9-00A0C922E6EC} 0044: DWORD $54 (This is the length again) 0048: DWORD -1 (unknown) 004C: DWORD -1 (unknown) 0050: DWORD -1 (unknown)
ChmLzxBlock	Decompresses a chm block.
ChmLzxcControlData	::DataSpace/Storage//ControlData This file contains $20 bytes of information on the compression.
ChmLzxcResetTable	LZXC reset table For ensuring a decompression.
ChmLzxState
ChmParser
ChmParsingException
ChmPmgiHeader	Description Note: not always exists An index chunk has the following format: 0000: char[4] 'PMGI' 0004: DWORD Length of quickref/free area at end of directory chunk 0008: Directory index entries (to quickref/free area) The quickref area in an PMGI is the same as in an PMGL The format of a directory index entry is as follows: BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: directory listing chunk which starts with name Encoded Integers aka ENCINT An ENCINT is a variable-length integer.
ChmPmglHeader	Description There are two types of directory chunks -- index chunks, and listing chunks.
ChmSection
ChmWrapper
CJKBigramAwareLengthFilterFactory	Creates a very narrowly focused TokenFilter that limits tokens based on length _unless_ they've been identified as <DOUBLE> or <SINGLE> by the CJKBigramFilter.
ClassLoaderUtil
ClassParser	Parser for Java .class files.
CleanPhoneText	Class to help de-obfuscate phone numbers in text.
ClimateForcast	Met keys from NCAR CCSM files in the Climate Forecast Convention.
ClosedInputStream	Closed input stream.
CloseShieldInputStream	Proxy stream that prevents the underlying input stream from being closed.
ColInfo
Cols
CommandLineParserBuilder	Reads configurable options from a config file and returns org.apache.commons.cli.Options object to be used in commandline parser.
CommonsDigester	Implementation of `DigestingParser.Digester` that relies on commons.codec.digest.DigestUtils to calculate digest hashes.
CommonsDigester.DigestAlgorithm
CommonTokenCountManager
CommonTokenOverlapCounter
CommonTokenResult
CommonTokens
CommonTokensBhattacharyya
CommonTokensCosine
CommonTokensHellinger
CommonTokensKLDivergence
CommonTokensKLDNormed
CompositeDetector	Content type detector that combines multiple different detection mechanisms.
CompositeDigester
CompositeEncodingDetector
CompositeExternalParser	A Composite Parser that wraps up all the available External Parsers, and provides an easy way to access them.
CompositeMatcher	Composite XPath evaluation state.
CompositeParser	Composite parser that delegates parsing tasks to a component parser based on the declared content type of the incoming document.
CompositeTagHandler	Takes an array of `ID3Tags` in preference order, and when asked for a given tag, will return it from the first `ID3Tags` that has it.
CompositeTextStatsCalculator
CompressorParser	Parser for various compression formats.
CompressorParserOptions	Interface for setting options for the `CompressorParser` by passing via the `ParseContext`.
ConcurrentUtils	Utility Class for Concurrency in Tika
ConfigurableThreadPoolExecutor	Allows Thread Pool to be Configurable.
ConsumersManager	Simple interface around a collection of consumers that allows for initializing and shutting shared resources (e.g.
ContainerExtractor	Tika container extractor interface.
ContentHandlerDecorator	Decorator base class for the `ContentHandler` interface.
ContentHandlerExample	Examples of using different Content Handlers to get different parts of the file's contents
ContentHandlerFactory	Interface to allow easier injection of code for getting a new ContentHandler
ContentLengthCalculator
ContentTagParser
ContentTags
ContrastStatistics
CoreNLPNERecogniser	This class offers an implementation of `NERecogniser` based on CRF classifiers from Stanford CoreNLP.
CorruptedFileException	This exception should be thrown when the parse absolutely, positively has to stop.
CountingInputStream	A decorating input stream that counts the number of bytes that have passed through the stream so far.
CreativeCommons	A collection of Creative Commons properties names.
CryptoParser	Decrypts the incoming document stream and delegates further parsing to another parser instance.
CSVMessageBodyWriter
CSVParams
CSVResult
CTAKESAnnotationProperty	This enumeration includes the properties that an `IdentifiedAnnotation` object can provide.
CTAKESConfig	Configuration for `CTAKESContentHandler`.
CTAKESContentHandler	Class used to extract biomedical information while parsing.
CTAKESParser	CTAKESParser decorates a `Parser` and leverages on `CTAKESContentHandler` to extract biomedical information from clinical text using Apache cTAKES.
CTAKESSerializer	Enumeration for types of cTAKES (UIMA) CAS serializer supported by cTAKES.
CTAKESUtils	This class provides methods to extract biomedical information from plain text using `CTAKESContentHandler` that relies on Apache cTAKES.
CustomMimeInfo
Database
DataURIScheme
DataURISchemeParseException
DataURISchemeUtil	Not thread safe.
DateUtils	Date related utility methods and constants
DBBuffer
DBConsumersManager
DBFParser	This is a Tika wrapper around the DBFReader.
DBWriter	This is still in its early stages.
DcXMLParser	Dublin Core metadata parser
DefaultContentHandlerFactoryBuilder	Builds BasicContentHandler with type defined by attribute "basicHandlerType" with possible values: xml, html, text, body, ignore.
DefaultDetector	A composite detector based on all the `Detector` implementations available through the `service provider mechanism`.
DefaultEncodingDetector	A composite encoding detector based on all the `EncodingDetector` implementations available through the `service provider mechanism`.
DefaultHtmlMapper	The default HTML mapping rules in Tika.
DefaultInputStreamFactory	Passthrough -- returns InputStream as is
DefaultParser	A composite parser based on all the `Parser` implementations available through the `service provider mechanism`.
DefaultProbDetector	A version of `DefaultDetector` for probabilistic mime detectors, which use statistical techniques to blend the results of differing underlying detectors when attempting to detect the type of a given file.
DefaultTranslator	A translator which picks the first available `Translator` implementations available through the `service provider mechanism`.
DelegatingParser	Base class for parser implementations that want to delegate parts of the task of parsing an input document to another parser.
DescribeMetadata	Print the supported Tika Metadata models and their fields.
Detector	Content type detector.
DetectorResource
DIFContentHandler
DIFContentHandler
DIFParser
DigestingAutoDetectParserFactory
DigestingParser
DigestingParser.Digester	Interface for digester.
DigestingParser.Encoder	Encodes byte array from a MessageDigest to String
DirectFileReadDataSource	A `DataSource` implementation that relies on direct reads from a `RandomAccessFile`.
DirectoryListingEntry	The format of a directory listing entry is as follows: BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: content section ENCINT: offset ENCINT: length The offset is from the beginning of the content section the file is in, after the section has been decompressed (if appropriate).
DirListParser	Parses the output of /bin/ls and counts the number of files and the number of executables using Tika.
DisplayMetInstance	Grabs a PDF file from a URL and prints its `Metadata`
DL4JInceptionV3Net	`DL4JInceptionV3Net` is an implementation of `ObjectRecogniser`.
DL4JVGG16Net
DocumentSelector	Interface for different document selection strategies for purposes like embedded document extraction by a `ContainerExtractor` instance.
DublinCore	A collection of Dublin Core metadata names.
DumpTikaConfigExample	This class shows how to dump a TikaConfig object to a configuration file.
DurationFormatUtils	Functionality and naming conventions (roughly) copied from org.apache.commons.lang3 so that we didn't have to add another dependency.
DWGParser	DWG (CAD Drawing) parser.
ElementMappingContentHandler	Content handler decorator that maps element `QName`s using a `Map`.
ElementMappingContentHandler.TargetElement
ElementMatcher	Final evaluation state of an XPath expression that targets an element.
ElementMetadataHandler	SAX event handler that maps the contents of an XML element into a metadata field.
EmbeddedContentHandler	Content handler decorator that prevents the `EmbeddedContentHandler.startDocument()` and `EmbeddedContentHandler.endDocument()` events from reaching the decorated handler.
EmbeddedDocumentExtractor
EmbeddedDocumentUtil	Utility class to handle common issues with embedded documents.
EmbeddedResourceHandler	Tika container extractor callback interface.
Embedder	Tika embedder interface
EMFParser	Extracts files embedded in EMF and offers a very rough capability to extract text if there is text stored in the EMF.
EmptyDetector	Dummy detector that returns application/octet-stream for all documents.
EmptyParser	Dummy parser that always produces an empty XHTML document without even attempting to parse the given document stream.
EmptyTranslator	Dummy translator that always declines to give any text.
EncodingDetector	Character encoding detector.
EncryptedDocumentException
EncryptedPrescriptionDetector
EncryptedPrescriptionParser
EndDocumentShieldingContentHandler	A wrapper around a `ContentHandler` which will ignore normal SAX calls to `EndDocumentShieldingContentHandler.endDocument()`, and only fire them later.
EndianUtils	General Endian Related Utilties.
EndianUtils.BufferUnderrunException
EnviHeaderParser
EpubContentParser	Parser for EPUB OPS `*.html` files.
EpubParser	Epub parser
ErrorParser	Dummy parser that always throws a `TikaException` without even attempting to parse the given document stream.
EvalConsumerBuilder
EvalConsumersBuilder
EvalExceptionUtils
ExcelExtractor	Excel parser implementation which uses POI's Event API to handle the contents of a Workbook.
ExceptionUtils
ExecutableParser	Parser for executable files.
ExpandedTitleContentHandler	Content handler decorator which wraps a `TransformerHandler` in order to allow the `TITLE` tag to render as `<title></title>` rather than `<title/>` which is accomplished by calling the `ContentHandler.characters(char[], int, int)` method with a `length` of 1 but a zero length char array.
ExternalEmbedder	Embedder that uses an external program (like sed or exiftool) to embed text content and metadata into a given document.
ExternalParser	Parser that uses an external program (like catdoc or pdf2txt) to extract text content and metadata from a given document.
ExternalParser.LineConsumer	Consumer contract
ExternalParsersConfigReader	Builds up ExternalParser instances based on XML file(s) which define what to run, for what, and how to process any output metadata.
ExternalParsersConfigReaderMetKeys	Met Keys used by the `ExternalParsersConfigReader`.
ExternalParsersFactory	Creates instances of ExternalParser based on XML configuration files.
ExternalTranslator	Abstract class used to interact with command line/external Translators.
ExtractComparer
ExtractComparerBuilder
ExtractEmbeddedFiles
ExtractProfiler
ExtractProfilerBuilder
ExtractReader
ExtractReader.ALTER_METADATA_LIST
ExtractReaderException	Exception when trying to read extract
ExtractReaderException.TYPE
FeedParser	Feed parser.
FictionBookParser
Field	Field annotation is a contract for binding `Param` value from Tika Configuration to an object.
FileConfig	Configuration for the "file" (or file-alternative) command.
FilenameUtils
FileResource	This is a basic interface to handle a logical "file".
FileResourceConsumer	This is a base class for file consumers.
FileResourceCrawler
FLVParser	Parser for metadata contained in Flash Videos (.flv).
Font
ForkParser
ForkProxy
ForkResource
FormattingUtils
FormattingUtils.Tag
FSBatchProcessCLI
FSConsumersManager
FSCrawlerBuilder	Builds either an FSDirectoryCrawler or an FSListCrawler.
FSDirectoryCrawler
FSDirectoryCrawler.CRAWL_ORDER
FSDocumentSelector	Selector that chooses files based on their file name and their size, as determined by Metadata.RESOURCE_NAME_KEY and Metadata.CONTENT_LENGTH.
FSFileResource	FileSystem(FS)Resource wraps a file name.
FSListCrawler	Class that "crawls" a list of files.
FSOutputStreamFactory
FSOutputStreamFactory.COMPRESSION
FSProperties
FSUtil	Utility class to handle some common issues when reading from and writing to a file system (FS).
FSUtil.HANDLE_EXISTING
GDALParser	Wraps execution of the Geospatial Data Abstraction Library (GDAL) `gdalinfo` tool used to extract geospatial information out of hundreds of geo file formats.
GenericConverter	Trys to convert as much of the properties in the `Metadata` map to XMP namespaces.
GeoGazetteerClient
Geographic	Geographic schema.
GeographicInformationParser
GeoParser
GeoParserConfig
GeoTag
GoogleTranslator	An implementation of a REST client to the Google Translate v2 API.
GrabPhoneNumbersExample	Class to demonstrate how to use the `PhoneExtractingContentHandler` to get a list of all of the phone numbers from every file in a directory.
GribParser
GrobidNERecogniser
GrobidRESTParser
H2Util
HDFParser	Since the `NetCDFParser` depends on the NetCDF-Java API, we are able to use it to parse HDF files as well.
HexCoDec	A set of Hex encoding and decoding utility methods.
HSLFExtractor
HTML
HtmlEncodingDetector	Character encoding detector for determining the character encoding of a HTML document based on the potential charset parameter found in a Content-Type http-equiv meta tag somewhere near the beginning.
HTMLHelper	Helps produce user facing HTML output.
HtmlMapper	HTML mapper used to make incoming HTML documents easier to handle by Tika clients.
HtmlParser	HTML parser.
HttpHeaders	A collection of HTTP header names.
HwpStreamReader
HwpTextExtractorV5
HwpV5Parser
ICNSParser	A basic parser class for Apple ICNS icon files
ICNSType	Holds details on Apple ICNS icons
IContentHandlerFactoryBuilder
ICrawlerBuilder
Icu4jEncodingDetector
ID3Tags	Interface that defines the common interface for ID3 tag parsers, such as ID3v1 and ID3v2.3.
ID3Tags.ID3Comment	Represents a comments in ID3 (especially ID3 v2), where are made up of several parts
ID3v1Handler	This is used to parse ID3 Version 1 Tag information from an MP3 file, if available.
ID3v22Handler	This is used to parse ID3 Version 2.2 Tag information from an MP3 file, if available.
ID3v23Handler	This is used to parse ID3 Version 2.3 Tag information from an MP3 file, if available.
ID3v24Handler	This is used to parse ID3 Version 2.4 Tag information from an MP3 file, if available.
ID3v2Frame	A frame of ID3v2 data, which is then passed to a handler to be turned into useful data.
ID3v2Frame.RawTag
ID3v2Frame.TextEncoding
IDBWriter
IdentityHtmlMapper	Alternative HTML mapping rules that pass the input HTML as-is without any modifications.
IFileProcessorFutureResult	stub interface to allow for different result types from different processors
ImageMetadataExtractor	Uses the Metadata Extractor library to read EXIF and IPTC image metadata and map to Tika fields.
ImageParser
ImportContextImpl	`ImportContextImpl`...
Initializable	Components that must do special processing across multiple fields at initialization time should implement this interface.
InitializableProblemHandler	This is to be used to handle potential recoverable problems that might arise during initialization.
InputStreamDigester
InputStreamFactory	Interface to allow for custom/consistent creation of InputStream
InterruptableParsingExample	This example demonstrates how to interrupt document parsing if some condition is met.
Interrupter	Class that waits for input on System.in.
InterrupterBuilder	Builds an Interrupter
InterrupterFutureResult
IOExceptionWithCause	Subclasses IOException with the `Throwable` constructors missing before Java 6.
IOUtils	General IO stream manipulation utilities.
IParserFactoryBuilder
IPTC	IPTC photo metadata schema.
IptcAnpaParser	Parser for IPTC ANPA New Wire Feeds
ISArchiveParser
ISATabUtils
ITikaToXMPConverter	Interface for the specific `Metadata` to XMP converters
IWork13PackageParser
IWork13PackageParser.IWork13DocumentType
IWorkPackageParser	A parser for the IWork container files.
IWorkPackageParser.IWORKDocumentType
JackcessParser	Parser that handles Microsoft Access files via Jackcess
JDBCUtil
JDBCUtil.CREATE_TABLE
JempboxExtractor
JoshuaNetworkTranslator	This translator is designed to work with a TCP-IP available Joshua translation server, specifically the REST-based Joshua server.
JournalParser
JpegParser
JSONMessageBodyWriter
JsonMetadata
JsonMetadataBase
JsonMetadataDeserializer	Deserializer for Metadata If overriding this, remember that this is called from a static context.
JsonMetadataList
JsonMetadataSerializer	Serializer for Metadata If overriding this, remember that this is called from a static context.
JsonStreamingSerializer
LangModel
Language
Language
LanguageAwareTokenCountStats<T>	Interface for calculators that require language probabilities and token stats
LanguageConfidence
LanguageDetectingParser
LanguageDetector
LanguageDetectorExample
LanguageHandler	SAX content handler that updates a language detector based on all the received character content.
LanguageIdentifier	Deprecated. use a concrete class of `LanguageDetector`
LanguageIDWrapper	The most efficient way to call this in a multithreaded environment is to call `LanguageIDWrapper.loadBuiltInModels()` before instantiating the
LanguageNames	Support for language tags (as defined by https://tools.ietf.org/html/bcp47) See https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes for a list of three character language codes.
LanguageProfile	Deprecated.
LanguageProfilerBuilder	Deprecated.
LanguageResource
LanguageResult
LanguageWriter	Writer that builds a language profile based on all the written content.
Latin1StringsParser	Parser to extract printable Latin1 strings from arbitrary files with pure java without running any external process.
LeipzigHelper
LeipzigSampler
Lingo24LangDetector	An implementation of a Language Detector using the Premium MT API v1.
Lingo24Translator	An implementation of a REST client for the Premium MT API v1.
Link
LinkContentHandler	Content handler that collects links from an XHTML document.
LinkedCell	Linked cell.
ListDescriptor	Contains the information for a single list in the list or list override tables.
ListManager	Computes the number text which goes at the beginning of each list paragraph
LoadErrorHandler	Interface for error handling strategies in service class loading.
Location
LookaheadInputStream	Stream wrapper that make it easy to read up to n bytes ahead from a stream that supports the mark feature.
LuceneIndexer
LuceneIndexerExtended
LyricsHandler	This is used to parse Lyrics3 tag information from an MP3 file, if available.
MachineMetadata	Metadata for describing machines, such as their architecture, type and endian-ness
MachineMetadata.Endian
MagicDetector	Content type detection based on magic bytes, i.e.
MailUtil
MappedBufferCleaner	Copied/pasted from the Apache Lucene/Solr project.
Matcher	XPath element matcher.
MatchingContentHandler	Content handler decorator that only passes the elements, attributes, and text nodes that match the given XPath expression.
MatParser
MboxParser	Mbox (mailbox) parser.
MediaType	Internet media type.
MediaTypeExample
MediaTypeRegistry	Registry of known Internet media types.
Message	A collection of Message related property names.
Metadata	A multi-valued metadata container.
MetadataAwareLuceneIndexer	Builds on the LuceneIndexer from Chapter 5 and adds indexing of Metadata.
MetadataExtractor	OOXML metadata extractor.
MetadataFields	Knowns about all declared `Metadata` fields.
MetadataHandler	Deprecated. Use the `AttributeMetadataHandler` and `ElementMetadataHandler` classes instead
MetadataList	wrapper class to make isWriteable in MetadataListMBW simpler
MetadataListMessageBodyWriter
MetadataResource
MicrosoftTranslator	Wrapper class to access the Windows translation service.
MidiParser
MimeBuffer
MimeType	Internet media type.
MimeTypeException	A class to encapsulate MimeType related exceptions.
MimeTypes	This class is a MimeType repository.
MimeTypesFactory	Creates instances of MimeTypes.
MimeTypesReader	A reader for XML files compliant with the freedesktop MIME-info DTD.
MimeTypesReaderMetKeys	Met Keys used by the `MimeTypesReader`.
MITIENERecogniser	This class offers an implementation of `NERecogniser` based on trained models using state-of-the-art information extraction tools.
MosesTranslator	Translator that uses the Moses decoder for translation.
MP3Frame	A frame in an MP3 file, such as ID3v2 Tags or some audio.
Mp3Parser	The `Mp3Parser` is used to parse ID3 Version 1 Tag information from an MP3 file, if available.
Mp3Parser.ID3TagsAndAudio
MP4Parser	Parser for the MP4 media container format, as well as the older QuickTime format that MP4 is based on.
MSOffice	A collection of Microsoft Office and Open Document property names.
MSOfficeBinaryConverter	Tika to XMP mapping for the binary MS formats Word (.doc), Excel (.xls) and PowerPoint (.ppt).
MSOfficeXMLConverter	Tika to XMP mapping for the Office Open XML formats Word (.docx), Excel (.xlsx) and PowerPoint (.pptx).
MSOwnerFileParser	Parser for temporary MSOFfice files.
MyFirstTika	Demonstrates how to call the different components within Tika: its `Detector` framework (aka MIME identification and repository), its `Parser` interface, its `LanguageIdentifier` and other goodies.
NamedAttributeMatcher	Final evaluation state of a `.../@name` XPath expression.
NamedElementMatcher	Intermediate evaluation state of a `.../name...` XPath expression.
NamedEntityParser	This implementation of `Parser` extracts entity names from text content and adds it to the metadata.
NameDetector	Content type detection based on the resource name.
NameEntityExtractor
Namespace	Utility class to hold namespace information.
NERecogniser	Defines a contract for named entity recogniser.
NetCDFParser	A `Parser` for NetCDF files using the UCAR, MIT-licensed NetCDF for Java API.
NetworkParser
NLTKNERecogniser	This class offers an implementation of `NERecogniser` based on ne_chunk() module of NLTK.
NNExampleModelDetector
NNTrainedModel
NNTrainedModelBuilder
NodeMatcher	Final evaluation state of a `.../node()` XPath expression.
NonDetectingEncodingDetector	Always returns the charset passed in via the initializer
NSNormalizerContentHandler	Content handler decorator that: Maps old OpenOffice 1.0 Namespaces to the OpenDocument ones Returns a fake DTD when parser requests OpenOffice DTD
NullInputStream	A functional, light weight `InputStream` that emulates a stream of a specified size.
NullOutputStream	This OutputStream writes all data to the famous /dev/null.
NumberCell	Number cell.
ObjectFromDOMAndQueueBuilder<T>	Same as `ObjectFromDOMAndQueueBuilder`, but this is for objects that require access to the shared queue.
ObjectFromDOMBuilder<T>	Interface for things that build objects from a DOM Node and a map of runtime attributes
ObjectRecogniser	This is a contract for object recognisers used by `ObjectRecognitionParser`
ObjectRecognitionParser	This parser recognises objects from Images.
Office	Office Document properties collection.
OfficeOpenXMLCore	Core properties as defined in the Office Open XML specification part Two that are not in the DublinCore namespace.
OfficeOpenXMLExtended	Extended properties as defined in the Office Open XML specification part Four.
OfficeParser	Defines a Microsoft document content extractor.
OfficeParser.POIFSDocumentType
OfficeParserConfig
OfflineContentHandler	Content handler decorator that always returns an empty stream from the `OfflineContentHandler.resolveEntity(String, String)` method to prevent potential network or other external resources from being accessed by an XML parser.
OldExcelParser	A POI-powered Tika Parser for very old versions of Excel, from pre-OLE2 days, such as Excel 4.
OOXMLExtractor	Interface implemented by all Tika OOXML extractors.
OOXMLExtractorFactory	Figures out the correct `OOXMLExtractor` for the supplied document and returns it.
OOXMLParser	Office Open XML (OOXML) parser.
OOXMLTikaBodyPartHandler
OOXMLWordAndPowerPointTextHandler	This class is intended to handle anything that might contain IBodyElements: main document, headers, footers, notes, slides, etc.
OOXMLWordAndPowerPointTextHandler.EditType
OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
OpenDocumentContentParser	Parser for ODF `content.xml` files.
OpenDocumentConverter	Tika to XMP mapping for the Open Document formats: Text (.odt), Spreatsheet (.ods), Graphics (.odg) and Presentation (.odp).
OpenDocumentMetaParser	Parser for OpenDocument `meta.xml` files.
OpenDocumentParser	OpenOffice parser
OpenNLPNameFinder	An implementation of `NERecogniser` that finds names in text using Open NLP Model.
OpenNLPNERecogniser	This implementation of `NERecogniser` chains an array of `OpenNLPNameFinder`s for which NER models are available in classpath.
OpenOfficeParser	Deprecated. Use the `OpenDocumentParser` class instead.
OptimaizeLangDetector	Implementation of the LanguageDetector API that uses https://github.com/optimaize/language-detector
OutlookExtractor	Outlook Message Parser.
OutlookExtractor.RECIPIENT_TYPE
OutlookPSTParser	Parser for MS Outlook PST email storage files
OutputStreamFactory
OverrideDetector	Use this to force a content type detection via the `TikaCoreProperties.CONTENT_TYPE_OVERRIDE` key in the metadata object.
PackageParser	Parser for various packaging formats.
PagedText	XMP Paged-text schema.
ParagraphProperties
ParallelFileProcessingResult
Param<T>	This is a serializable model class for parameters from configuration file.
ParamField	This class stores metdata for `Field` annotation are used to map them to `Param` at runtime
ParseContext	Parse context.
Parser	Tika parser interface.
ParserContainerExtractor	An implementation of `ContainerExtractor` powered by the regular `Parser` API.
ParserDecorator	Decorator base class for the `Parser` interface.
ParserFactory
ParserFactory
ParserFactoryBuilder
ParserFactoryFactory	Lightweight, easily serializable class that contains enough information to build a `ParserFactory`
ParserPostProcessor	Parser decorator that post-processes the results from a decorated parser.
ParserUtils	Helper util methods for Parsers themselves.
ParsingEmbeddedDocumentExtractor	Helper class for parsers of package archives or other compound document formats that support embedded or attached component documents.
ParsingExample
ParsingReader	Reader for the text content from a given binary stream.
PasswordProvider	Interface for providing a password to a Parser for handling Encrypted and Password Protected Documents.
PDF	PDF properties collection.
PDFParser	PDF parser.
PDFParserConfig	Config for PDFParser.
PDFParserConfig.OCR_STRATEGY
Pharmacy
PhoneExtractingContentHandler	Class used to extract phone numbers while parsing.
Photoshop	XMP Photoshop metadata schema.
Pkcs7Parser	Basic parser for PKCS7 data.
POIFSContainerDetector	A detector that works on a POIFS OLE2 document to figure out exactly what the file is.
POIXMLTextExtractorDecorator
PooledTimeSeriesParser	Uses the Pooled Time Series algorithm + command line tool, to generate a numeric representation of the video suitable for similarity searches.
PrescriptionParser
PrettyMetadataKeyComparator
ProbabilisticMimeDetectionSelector	Selector for combining different mime detection results based on probability
ProbabilisticMimeDetectionSelector.Builder	build class for probability parameters setting
ProcessUtils
ProfilingHandler	Deprecated. use `LanguageHandler`
ProfilingWriter	Deprecated. use `LanguageWriter`
Property	XMP property definition.
Property.PropertyType
Property.ValueType
PropertyTypeException	XMP property definition violation exception.
PropsUtil	Utility class to handle properties.
ProxyInputStream	A Proxy stream which acts as expected, that is it passes the method calls on to the proxied stream and doesn't change which methods are being called.
PRTParser	A basic text extracting parser for the CADKey PRT (CAD Drawing) format.
PSDParser	Parser for the Adobe Photoshop PSD File Format.
QuattroPro	QuattroPro properties collection.
QuattroProParser	Parser for Corel QuattroPro documents (part of Corel WordPerfect Office Suite).
RarParser	Parser for Rar files.
RecentFiles	Builds on top of the LuceneIndexer and the Metadata discussions in Chapter 6 to output an RSS (or RDF) feed of files crawled by the LuceneIndexer within the last N minutes.
RecognisedObject	A model for recognised objects from graphics and texts typically includes human readable label for the object, language of the label, id and confidence score.
RecursiveMetadataResource
RecursiveParserWrapper	This is a helper class that wraps a parser in a recursive handler.
RecursiveParserWrapperFSConsumer	This runs a RecursiveParserWrapper against an input file and outputs the json metadata to an output file.
RecursiveParserWrapperHandler	This is the default implementation of `AbstractRecursiveParserWrapperHandler`.
RegexNERecogniser	This class offers an implementation of `NERecogniser` based on Regular Expressions.
RegexUtils	Inspired from Nutch code class OutlinkExtractor.
ReplacementCharset	An implementation of the standard "replacement" charset defined by the W3C.
Report	This class represents a single report.
ReporterBuilder	Interface for reporter builders
RereadableInputStream	Wraps an input stream, reading it only once, but making it available for rereading an arbitrary number of times.
ResultsReporter
RFC822Parser	Uses apache-mime4j to parse emails.
RichTextContentHandler	Content handler for Rich Text, it will extract XHTML <img/> tag <alt/> attribute and XHTML <a/> tag <name/> attribute into the output.
RollbackSoftware	Demonstrates Tika and its ability to sense symlinks.
RTFConverter	Tika to XMP mapping for the RTF format.
RTFMetadata
RTFParser	RTF parser
RunProperties	WARNING: This class is mutable.
SafeContentHandler	Content handler decorator that makes sure that the character events (`SafeContentHandler.characters(char[], int, int)` or `SafeContentHandler.ignorableWhitespace(char[], int, int)`) passed to the decorated content handler contain only valid XML characters.
SafeContentHandler.Output	Internal interface that allows both character and ignorable whitespace content to be filtered the same way.
SAS7BDATParser	Processes the SAS7BDAT data columnar database file used by SAS and other similar languages.
SecureContentHandler	Content handler decorator that attempts to prevent denial of service attacks against Tika parsers.
SentimentAnalysisParser	This parser classifies documents based on the sentiment of document.
ServerStatus
ServerStatus.STATUS
ServerStatus.TASK
ServerStatusWatcher
ServerTimeouts
ServiceLoader	Internal utility class that Tika uses to look up service providers.
ServiceLoaderUtils	Service Loading and Ordering related utils
SimpleLogReporterBuilder
SimpleTextExtractor
SimpleThreadPoolExecutor	Simple Thread Pool Executor
SimpleTypeDetector
SlowCompositeReaderWrapper	COPIED VERBATIM FROM LUCENE This class forces a composite reader (eg a `MultiReader` or `DirectoryReader`) to emulate a `LeafReader`.
SourceCodeParser	Generic Source code parser for Java, Groovy, C++.
SpreadsheetMLParser	Parses wordml 2003 format Excel files.
SpringExample
SQLite3Parser	This is the main class for parsing SQLite3 files.
StandardHtmlEncodingDetector	An encoding detector that tries to respect the spirit of the HTML spec part 12.2.3 "The input byte stream", or at least the part that is compatible with the implementation of tika.
StandardOrganizations	This class provides a collection of the most important technical standard organizations.
StandardReference	Class that represents a standard reference.
StandardReference.StandardReferenceBuilder
StandardsExtractingContentHandler	StandardsExtractingContentHandler is a Content Handler used to extract standard references while parsing.
StandardsExtractionExample	Class to demonstrate how to use the `StandardsExtractingContentHandler` to get a list of the standard references from every file in a directory.
StandardsText	StandardText relies on regular expressions to extract standard references from text.
StatusReporter	Basic class to use for reporting status from both the crawler and the consumers.
StatusReporterBuilder
StatusReporterFutureResult	Empty class for what a StatusReporter returns when it finishes.
StrawManTikaAppDriver	Simple single-threaded class that calls tika-app against every file in a directory.
StreamingZipContainerDetector
StreamOutRPWFSConsumer	This uses the `JsonStreamingSerializer` to write out a single metadata object at a time.
StringsConfig	Configuration for the "strings" (or strings-alternative) command.
StringsEncoding	Character encoding of the strings that are to be found using the "strings" command.
StringsParser	Parser that uses the "strings" (or strings-alternative) command to find the printable strings in a object, or other binary, file (application/octet-stream).
StringStatsCalculator<T>	Interface for calculators that require a string
SubtreeMatcher	Evaluation state of a `...//...` XPath expression.
SummaryExtractor	Extractor for Common OLE2 (HPSF) metadata
SXSLFPowerPointExtractorDecorator	SAX/Streaming pptx extractior
SXWPFWordExtractorDecorator	This is an experimental, alternative extractor for docx files.
SystemUtils	Copied from commons-lang to avoid requiring the dependency
TableInfo
TaggedContentHandler	A content handler decorator that tags potential exceptions so that the handler that caused the exception can easily be identified.
TaggedInputStream	An input stream decorator that tags potential exceptions so that the stream that caused the exception can easily be identified.
TaggedIOException	An `IOException` wrapper that tags the wrapped exception with a given object reference.
TaggedSAXException	A `SAXException` wrapper that tags the wrapped exception with a given object reference.
TailStream	A specialized input stream implementation which records the last portion read from an underlying stream.
TarWriter
TaskStatus
TeeContentHandler	Content handler proxy that forwards the received SAX events to zero or more underlying content handlers.
TEIDOMParser
TemporaryResources	Utility class for tracking and ultimately closing or otherwise disposing a collection of temporary resources.
TensorflowImageRecParser	This is an implementation of `ObjectRecogniser` powered by Tensorflow convolutional neural network (CNN).
TensorflowRESTCaptioner	Tensorflow image captioner.
TensorflowRESTRecogniser	Tensor Flow image recogniser which has high performance.
TensorflowRESTVideoRecogniser	Tensor Flow video recogniser which has high performance.
TesseractOCRConfig	Configuration for TesseractOCRParser.
TesseractOCRConfig.OUTPUT_TYPE
TesseractOCRParser	TesseractOCRParser powered by tesseract-ocr engine.
TextAndCSVParser	Unless the `TikaCoreProperties.CONTENT_TYPE_OVERRIDE` is set, this parser tries to assess whether the file is a text file, csv or tsv.
TextCell	Text cell.
TextContentHandler	Content handler decorator that only passes the `TextContentHandler.characters(char[], int, int)` and (@link `TextContentHandler.ignorableWhitespace(char[], int, int)` (plus `TextContentHandler.startDocument()` and `TextContentHandler.endDocument()` events to the decorated content handler.
TextDetector	Content type detection of plain text documents.
TextLangDetector	Language Detection using MIT Lincoln Lab’s Text.jl library https://github.com/trevorlewis/TextREST.jl Please run the TextREST.jl server before using this.
TextMatcher	Final evaluation state of a `.../text()` XPath expression.
TextMessageBodyWriter	Returns simple text string for a particular metadata value.
TextStatistics	Utility class for computing a histogram of the bytes seen in a stream.
TextStatsCalculator	Base text stats interface
TextStatsFromTikaEval	These examples create a new `CompositeTextStatsCalculator` for each call.
TIAParsingExample
TIFF	XMP Exif TIFF schema.
TiffParser
Tika	Facade class for accessing Tika functionality.
TikaActivator	Bundle activator that adjust the class loading mechanism of the `ServiceLoader` class to work correctly in an OSGi environment.
TikaCLI	Simple command line interface for Apache Tika.
TikaConfig	Parse xml config file.
TikaConfigException	Tika Config Exception is an exception to occur when there is an error in Tika config file and/or one or more of the parsers failed to initialize from that erroneous config.
TikaConfigSerializer
TikaConfigSerializer.Mode
TikaCoreProperties	Contains a core set of basic Tika metadata properties, which all parsers will attempt to supply (where the file format permits).
TikaCoreProperties.EmbeddedResourceType	A file might contain different types of embedded documents.
TikaDetectors	Provides details of all the `Detector`s registered with Apache Tika, similar to --list-detectors with the Tika CLI.
TikaEvalCLI
TikaExcelDataFormatter	Overrides Excel's General format to include more significant digits than the MS Spec allows.
TikaExcelGeneralFormat	A Format that allows up to 15 significant digits for integers.
TikaException	Tika exception
TikaFileTypeDetector
TikaGUI	Simple Swing GUI for Apache Tika.
TikaInputStream	Input stream with extended capabilities.
TikaLoggingFilter
TikaMemoryLimitException
TikaMetadataKeys	Contains keys to properties in Metadata instances.
TikaMimeKeys	A collection of Tika metadata keys used in Mime Type resolution
TikaMimeTypes	Provides details of all the mimetypes known to Apache Tika, similar to --list-supported-types with the Tika CLI.
TikaParsers	Provides details of all the `Parser`s registered with Apache Tika, similar to --list-parsers and --list-parser-details within the Tika CLI.
TikaResource
TikaServerCli
TikaServerParseException	Simple wrapper exception to be thrown for consistent handling of exceptions that can happen during a parse.
TikaServerParseExceptionMapper
TikaServerWatchDog
TikaToXMP
TikaVersion
TikaWelcome	Provides a basic welcome to the Apache Tika Server.
TNEFParser	A POI-powered Tika Parser for TNEF (Transport Neutral Encoding Format) messages, aka winmail.dat
ToHTMLContentHandler	SAX event handler that serializes the HTML document to a character stream.
TokenContraster	Computes some corpus contrast statistics.
TokenCounter	Deprecated. use `CompositeTextStatsCalculator` with `TokenEntropy`, `TokenLengths` and `TopNTokens`.
TokenCountPriorityQueue
TokenCountPriorityQueue
TokenCounts
TokenCountStatsCalculator<T>	Interface for calculators that require token stats
TokenEntropy
TokenIntPair
TokenLengths
TokenStatistics
TopCommonTokenCounter	Utility class that reads in a UTF-8 input file with one document per row and outputs the 20000 tokens with the highest document frequencies.
TopNTokens
ToTextContentHandler	SAX event handler that writes all character content out to a character stream.
ToXMLContentHandler	SAX event handler that serializes the XML document to a character stream.
TrainedModel
TrainedModelDetector
TrainTestSplit
TranslateResource
Translator	Interface for Translator services.
TranslatorExample
TrecDocumentGenerator	Generates document summaries for corpus analysis in the Open Relevance project.
TrueTypeParser	Parser for TrueType font files (TTF).
TSDParser	Tika parser for Time Stamped Data Envelope (application/timestamped-data)
TXTParser	Plain text parser.
TypeDetector	Content type detection based on a content type hint.
UnicodeBlockCounter
UniversalEncodingDetector
UnpackerResource
UnsupportedFormatException	Parsers should throw this exception when they encounter a file format that they do not support.
URLEmailNormalizingFilterFactory	Factory for filter that normalizes urls and emails to __url__ and __email__ respectively.
URLEnabledInputStreamFactory	This class looks for "fileUrl" in the http header.
WebPParser
WMFParser	This parser offers a very rough capability to extract text if there is text stored in the WMF files.
Word2006MLParser
WordExtractor
WordExtractor.TagAndStyle
WordMLParser	Parses wordml 2003 format word files.
WordPerfect	WordPerfect properties collection.
WordPerfectParser	Parser for Corel WordPerfect documents.
WriteOutContentHandler	SAX event handler that writes content up to an optional write limit out to a character stream or other decorated handler.
XHTMLContentHandler	Content handler decorator that simplifies the task of producing XHTML events for Tika content parsers.
XLIFF12ContentHandler	Content Handler for XLIFF 1.2 documents.
XLIFF12Parser	Parser for XLIFF 1.2 files.
XLSXHREFFormatter
XLZParser	Parser for XLZ Archives.
XMLDOMUtil
XMLErrorLogUpdater	This is a very task specific class that reads a log file and updates the "comparisons" table.
XMLLogMsgHandler
XMLLogReader
XMLParser	XML parser.
XMLReaderUtils	Utility functions for reading XML.
XmlRootExtractor	Utility class that uses a `SAXParser` to determine the namespace URI and local name of the root element of an XML file.
XMP
XMPContentHandler	Content handler decorator that simplifies the task of producing XMP output.
XMPDM	XMP Dynamic Media schema.
XMPDM.ChannelTypePropertyConverter	Deprecated. Experimental method, will change shortly
XMPIdq
XMPMessageBodyWriter
XMPMetadata	Provides a conversion of the Metadata map from Tika to the XMP data model by also providing the Metadata API for clients to ease transition.
XMPMM
XMPPacketScanner	This class is a parser for XMP packets.
XMPRights	XMP Rights management schema.
XPathParser	Parser for a very simple XPath subset.
XPSExtractorDecorator
XPSTextExtractor	Currently, mostly a pass-through class to hold pkg and properties and keep the general framework similar to our other POI-integrated extractors.
XSLFEventBasedPowerPointExtractor
XSLFPowerPointExtractorDecorator
XSSFBExcelExtractorDecorator
XSSFExcelExtractorDecorator
XSSFExcelExtractorDecorator.HeaderFooterFromString
XSSFExcelExtractorDecorator.SheetTextAsHTML	Turns formatted sheet events into HTML
XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer	Captures information on interesting tags, whilst delegating the main work to the formatting handler
XUserDefinedCharset
XWPFEventBasedWordExtractor	Experimental class that is based on POI's XSSFEventBasedExcelExtractor
XWPFListManager
XWPFNumberingShim	Stub class of POI's XWPFNumbering because onDocumentRead() is protected
XWPFStylesShim	For Tika, all we need (so far) is a mapping between styleId and a style's name.
XWPFWordExtractorDecorator
YandexTranslator	An implementation of a REST client for the YANDEX Translate API.
ZeroByteFileException	Exception thrown by the AutoDetectParser when a file contains zero-bytes.
ZeroSizeFileDetector	Detector to identify zero length files as application/x-zerovalue
ZipContainerDetector	A detector that works on Zip documents and other archive and compression formats to figure out exactly what the file is.
ZipListFiles	Example code listing from Chapter 1.
ZipSalvager
ZipWriter