Interface OOXMLExtractor

All Known Implementing Classes:
AbstractOOXMLExtractor, POIXMLTextExtractorDecorator, SXSLFPowerPointExtractorDecorator, SXWPFWordExtractorDecorator, XPSExtractorDecorator, XSLFPowerPointExtractorDecorator, XSSFBExcelExtractorDecorator, XSSFExcelExtractorDecorator, XWPFWordExtractorDecorator

public interface OOXMLExtractor
Interface implemented by all Tika OOXML extractors.
See Also:
  • POIXMLTextExtractor
  • Method Summary

    Modifier and Type
    Method
    Description
    org.apache.poi.ooxml.POIXMLDocument
    Returns the opened document.
    POIXMLTextExtractor.getMetadataTextExtractor() not yet supported for OOXML by POI.
    void
    getXHTML(ContentHandler handler, Metadata metadata, ParseContext context)
    Parses the document into a sequence of XHTML SAX events sent to the given content handler.
  • Method Details

    • getDocument

      org.apache.poi.ooxml.POIXMLDocument getDocument()
      Returns the opened document.
      See Also:
      • POIXMLTextExtractor.getDocument()
    • getMetadataExtractor

      MetadataExtractor getMetadataExtractor()
      POIXMLTextExtractor.getMetadataTextExtractor() not yet supported for OOXML by POI.
    • getXHTML

      void getXHTML(ContentHandler handler, Metadata metadata, ParseContext context) throws SAXException, org.apache.xmlbeans.XmlException, IOException, TikaException
      Parses the document into a sequence of XHTML SAX events sent to the given content handler.
      Throws:
      SAXException
      org.apache.xmlbeans.XmlException
      IOException
      TikaException