Class OOXMLWordAndPowerPointTextHandler

java.lang.Object
org.xml.sax.helpers.DefaultHandler
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler

public class OOXMLWordAndPowerPointTextHandler extends DefaultHandler
This class is intended to handle anything that might contain IBodyElements: main document, headers, footers, notes, slides, etc.

This class does not generally check for namespaces, and it can be applied to PPTX and DOCX for text extraction.

This can be used to scrape content from charts. It currently ignores formula (<c:f/>) elements

This does not work with .xlsx or .vsdx.

TODO: move this into POI?