Uses of Class
org.apache.tika.sax.XHTMLContentHandler
-
-
Uses of XHTMLContentHandler in org.apache.tika.parser.executable
Methods in org.apache.tika.parser.executable with parameters of type XHTMLContentHandler Modifier and Type Method Description void
ExecutableParser. parseELF(XHTMLContentHandler xhtml, Metadata metadata, InputStream stream, byte[] first4)
Parses a Unix ELF filevoid
ExecutableParser. parsePE(XHTMLContentHandler xhtml, Metadata metadata, InputStream stream, byte[] first4)
Parses a DOS or Windows PE file -
Uses of XHTMLContentHandler in org.apache.tika.parser.hwp
Methods in org.apache.tika.parser.hwp with parameters of type XHTMLContentHandler Modifier and Type Method Description void
HwpTextExtractorV5. extract(InputStream source, Metadata metadata, XHTMLContentHandler xhtml)
extract Text from HWP Stream. -
Uses of XHTMLContentHandler in org.apache.tika.parser.isatab
Methods in org.apache.tika.parser.isatab with parameters of type XHTMLContentHandler Modifier and Type Method Description static void
ISATabUtils. parseAssay(InputStream stream, XHTMLContentHandler xhtml, Metadata metadata, ParseContext context)
static void
ISATabUtils. parseInvestigation(InputStream stream, XHTMLContentHandler handler, Metadata metadata, ParseContext context)
static void
ISATabUtils. parseInvestigation(InputStream stream, XHTMLContentHandler handler, Metadata metadata, ParseContext context, String studyFileName)
static void
ISATabUtils. parseStudy(InputStream stream, XHTMLContentHandler xhtml, Metadata metadata, ParseContext context)
-
Uses of XHTMLContentHandler in org.apache.tika.parser.microsoft
Methods in org.apache.tika.parser.microsoft with parameters of type XHTMLContentHandler Modifier and Type Method Description static void
FormattingUtils. closeStyleTags(XHTMLContentHandler xhtml, Deque<FormattingUtils.Tag> formattingState)
Closes all formatting tags.static void
FormattingUtils. ensureFormattingState(XHTMLContentHandler xhtml, EnumSet<FormattingUtils.Tag> desired, Deque<FormattingUtils.Tag> currentState)
Closes all tags untilcurrentState
contains only tags fromdesired
set, then open all required tags to reach desired state.protected void
ExcelExtractor. parse(org.apache.poi.poifs.filesystem.DirectoryNode root, XHTMLContentHandler xhtml, Locale locale)
protected void
ExcelExtractor. parse(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem, XHTMLContentHandler xhtml, Locale locale)
Extracts text from an Excel Workbook writing the extracted content to the specifiedAppendable
.protected void
HSLFExtractor. parse(org.apache.poi.poifs.filesystem.DirectoryNode root, XHTMLContentHandler xhtml)
protected void
HSLFExtractor. parse(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem, XHTMLContentHandler xhtml)
protected void
OfficeParser. parse(org.apache.poi.poifs.filesystem.DirectoryNode root, ParseContext context, Metadata metadata, XHTMLContentHandler xhtml)
protected static void
OldExcelParser. parse(org.apache.poi.hssf.extractor.OldExcelExtractor extractor, XHTMLContentHandler xhtml)
void
OutlookExtractor. parse(XHTMLContentHandler xhtml)
void
OutlookExtractor. parse(XHTMLContentHandler xhtml, Metadata metadata)
Deprecated.use {@link #parse(XHTMLContentHandler), will be removed after 2.4.0}protected void
WordExtractor. parse(org.apache.poi.poifs.filesystem.DirectoryNode root, XHTMLContentHandler xhtml)
protected void
WordExtractor. parse(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem, XHTMLContentHandler xhtml)
protected void
WordExtractor. parseWord6(org.apache.poi.poifs.filesystem.DirectoryNode root, XHTMLContentHandler xhtml)
protected void
WordExtractor. parseWord6(org.apache.poi.poifs.filesystem.POIFSFileSystem filesystem, XHTMLContentHandler xhtml)
void
Cell. render(XHTMLContentHandler handler)
Renders the content to the given XHTML SAX event stream.void
CellDecorator. render(XHTMLContentHandler handler)
void
LinkedCell. render(XHTMLContentHandler handler)
void
NumberCell. render(XHTMLContentHandler handler)
void
TextCell. render(XHTMLContentHandler handler)
-
Uses of XHTMLContentHandler in org.apache.tika.parser.microsoft.onenote.fsshttpb
Methods in org.apache.tika.parser.microsoft.onenote.fsshttpb with parameters of type XHTMLContentHandler Modifier and Type Method Description void
MSOneStorePackage. walkTree(OneNoteTreeWalkerOptions options, Metadata metadata, XHTMLContentHandler xhtml)
-
Uses of XHTMLContentHandler in org.apache.tika.parser.microsoft.ooxml
Methods in org.apache.tika.parser.microsoft.ooxml with parameters of type XHTMLContentHandler Modifier and Type Method Description protected abstract void
AbstractOOXMLExtractor. buildXHTML(XHTMLContentHandler xhtml)
Populates theXHTMLContentHandler
object received as parameter.protected void
POIXMLTextExtractorDecorator. buildXHTML(XHTMLContentHandler xhtml)
protected void
SXSLFPowerPointExtractorDecorator. buildXHTML(XHTMLContentHandler xhtml)
protected void
SXWPFWordExtractorDecorator. buildXHTML(XHTMLContentHandler xhtml)
protected void
XSLFPowerPointExtractorDecorator. buildXHTML(XHTMLContentHandler xhtml)
protected void
XSSFBExcelExtractorDecorator. buildXHTML(XHTMLContentHandler xhtml)
protected void
XSSFExcelExtractorDecorator. buildXHTML(XHTMLContentHandler xhtml)
protected void
XWPFWordExtractorDecorator. buildXHTML(XHTMLContentHandler xhtml)
protected void
XSSFBExcelExtractorDecorator. extractHeaderFooter(String hf, XHTMLContentHandler xhtml)
protected void
XSSFExcelExtractorDecorator. extractHeaderFooter(String hf, XHTMLContentHandler xhtml)
protected void
XSSFExcelExtractorDecorator. extractHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart, XHTMLContentHandler xhtml)
protected void
AbstractOOXMLExtractor. handleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part, XHTMLContentHandler xhtml, String rel, EmbeddedPartMetadata embeddedPartMetadata, TikaCoreProperties.EmbeddedResourceType embeddedResourceType)
Handles an embedded file in the documentprotected void
XSSFExcelExtractorDecorator. processShapes(List<org.apache.poi.xssf.usermodel.XSSFShape> shapes, XHTMLContentHandler xhtml)
Constructors in org.apache.tika.parser.microsoft.ooxml with parameters of type XHTMLContentHandler Constructor Description OOXMLTikaBodyPartHandler(XHTMLContentHandler xhtml)
OOXMLTikaBodyPartHandler(XHTMLContentHandler xhtml, XWPFStylesShim styles, XWPFListManager listManager, OfficeParserConfig parserConfig)
SheetTextAsHTML(OfficeParserConfig config, XHTMLContentHandler xhtml)
-
Uses of XHTMLContentHandler in org.apache.tika.parser.microsoft.ooxml.xps
Methods in org.apache.tika.parser.microsoft.ooxml.xps with parameters of type XHTMLContentHandler Modifier and Type Method Description protected void
XPSExtractorDecorator. buildXHTML(XHTMLContentHandler xhtml)
-
Uses of XHTMLContentHandler in org.apache.tika.parser.mp4
Constructors in org.apache.tika.parser.mp4 with parameters of type XHTMLContentHandler Constructor Description TikaMp4BoxHandler(com.drew.metadata.Metadata metadata, Metadata tikaMetadata, XHTMLContentHandler xhtml)
-
Uses of XHTMLContentHandler in org.apache.tika.parser.mp4.boxes
Constructors in org.apache.tika.parser.mp4.boxes with parameters of type XHTMLContentHandler Constructor Description TikaUserDataBox(String box, byte[] payload, Metadata metadata, XHTMLContentHandler xhtml)
-
Uses of XHTMLContentHandler in org.apache.tika.parser.pdf.image
Fields in org.apache.tika.parser.pdf.image declared as XHTMLContentHandler Modifier and Type Field Description protected XHTMLContentHandler
ImageGraphicsEngine. xhtml
Methods in org.apache.tika.parser.pdf.image with parameters of type XHTMLContentHandler Modifier and Type Method Description ImageGraphicsEngine
ImageGraphicsEngineFactory. newEngine(org.apache.pdfbox.pdmodel.PDPage page, int pageNumber, EmbeddedDocumentExtractor embeddedDocumentExtractor, PDFParserConfig pdfParserConfig, Map<org.apache.pdfbox.cos.COSStream,Integer> processedInlineImages, AtomicInteger imageCounter, XHTMLContentHandler xhtml, Metadata parentMetadata, ParseContext parseContext)
Constructors in org.apache.tika.parser.pdf.image with parameters of type XHTMLContentHandler Constructor Description ImageGraphicsEngine(org.apache.pdfbox.pdmodel.PDPage page, int pageNumber, EmbeddedDocumentExtractor embeddedDocumentExtractor, PDFParserConfig pdfParserConfig, Map<org.apache.pdfbox.cos.COSStream,Integer> processedInlineImages, AtomicInteger imageCounter, XHTMLContentHandler xhtml, Metadata parentMetadata, ParseContext parseContext)
-
Uses of XHTMLContentHandler in org.apache.tika.parser.pkg
Methods in org.apache.tika.parser.pkg with parameters of type XHTMLContentHandler Modifier and Type Method Description protected static Metadata
PackageParser. handleEntryMetadata(String name, Date createAt, Date modifiedAt, Long size, XHTMLContentHandler xhtml)
-