Class XSSFExcelExtractorDecorator
java.lang.Object
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- All Implemented Interfaces:
OOXMLExtractor
- Direct Known Subclasses:
XSSFBExcelExtractorDecorator
-
Nested Class Summary
Modifier and TypeClassDescriptionprotected static class
protected static class
Turns formatted sheet events into HTMLprotected static class
Captures information on interesting tags, whilst delegating the main work to the formatting handler -
Field Summary
Modifier and TypeFieldDescriptionprotected final org.apache.poi.ss.usermodel.DataFormatter
protected static org.apache.poi.xssf.usermodel.helpers.HeaderFooterHelper
Allows access to headers/footers from raw xml stringsprotected Metadata
protected ParseContext
protected final List<org.apache.poi.openxml4j.opc.PackagePart>
Fields inherited from class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
config, EMBEDDED_RELATIONSHIPS, extractor
-
Constructor Summary
ConstructorDescriptionXSSFExcelExtractorDecorator
(ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale) -
Method Summary
Modifier and TypeMethodDescriptionprotected void
addDrawingHyperLinks
(org.apache.poi.openxml4j.opc.PackagePart sheetPart) protected void
buildXHTML
(XHTMLContentHandler xhtml) Populates theXHTMLContentHandler
object received as parameter.protected void
configureExtractor
(org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale) protected void
extractHeaderFooter
(String hf, XHTMLContentHandler xhtml) protected void
extractHyperLinks
(org.apache.poi.openxml4j.opc.PackagePart sheetPart, XHTMLContentHandler xhtml) protected List<org.apache.poi.openxml4j.opc.PackagePart>
In Excel files, sheets have things embedded in them, and sheet drawings which have the imagesvoid
getXHTML
(ContentHandler handler, Metadata metadata, ParseContext context) Parses the document into a sequence of XHTML SAX events sent to the given content handler.protected void
processShapes
(List<org.apache.poi.xssf.usermodel.XSSFShape> shapes, XHTMLContentHandler xhtml) void
processSheet
(org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler sheetContentsHandler, org.apache.poi.xssf.model.Comments comments, org.apache.poi.xssf.model.StylesTable styles, org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable strings, InputStream sheetInputStream) Methods inherited from class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
getDocument, getEmbeddedPartMetadataMap, getJustFileName, getMetadataExtractor, handleEmbeddedFile, loadLinkedRelationships
-
Field Details
-
hfHelper
protected static org.apache.poi.xssf.usermodel.helpers.HeaderFooterHelper hfHelperAllows access to headers/footers from raw xml strings -
formatter
protected final org.apache.poi.ss.usermodel.DataFormatter formatter -
sheetParts
-
drawingHyperlinks
-
metadata
-
parseContext
-
-
Constructor Details
-
XSSFExcelExtractorDecorator
public XSSFExcelExtractorDecorator(ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale)
-
-
Method Details
-
configureExtractor
protected void configureExtractor(org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale) -
getXHTML
public void getXHTML(ContentHandler handler, Metadata metadata, ParseContext context) throws SAXException, org.apache.xmlbeans.XmlException, IOException, TikaException Description copied from interface:OOXMLExtractor
Parses the document into a sequence of XHTML SAX events sent to the given content handler.- Specified by:
getXHTML
in interfaceOOXMLExtractor
- Overrides:
getXHTML
in classAbstractOOXMLExtractor
- Throws:
SAXException
org.apache.xmlbeans.XmlException
IOException
TikaException
- See Also:
-
buildXHTML
protected void buildXHTML(XHTMLContentHandler xhtml) throws SAXException, org.apache.xmlbeans.XmlException, IOException Description copied from class:AbstractOOXMLExtractor
Populates theXHTMLContentHandler
object received as parameter.- Specified by:
buildXHTML
in classAbstractOOXMLExtractor
- Throws:
SAXException
org.apache.xmlbeans.XmlException
IOException
- See Also:
-
XSSFExcelExtractor.getText()
-
addDrawingHyperLinks
protected void addDrawingHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart) -
extractHyperLinks
protected void extractHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart, XHTMLContentHandler xhtml) throws SAXException - Throws:
SAXException
-
processShapes
protected void processShapes(List<org.apache.poi.xssf.usermodel.XSSFShape> shapes, XHTMLContentHandler xhtml) throws SAXException - Throws:
SAXException
-
getMainDocumentParts
protected List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts() throws TikaExceptionIn Excel files, sheets have things embedded in them, and sheet drawings which have the images- Specified by:
getMainDocumentParts
in classAbstractOOXMLExtractor
- Throws:
TikaException
-