|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.nutch.parse.ms.MSExtractor
public abstract class MSExtractor
Defines a Microsoft document content extractor.
Field Summary | |
---|---|
protected static org.apache.commons.logging.Log |
LOG
|
Constructor Summary | |
---|---|
protected |
MSExtractor()
Constructs a new Microsoft document extractor. |
Method Summary | |
---|---|
protected void |
extract(InputStream input)
Extracts properties and text from an MS Document input stream |
protected abstract String |
extractText(InputStream input)
Extracts the text content from a Microsoft document input stream. |
protected Properties |
getProperties()
Get the Properties of the Microsoft document. |
protected String |
getText()
Get the content text of the Microsoft document. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final org.apache.commons.logging.Log LOG
Constructor Detail |
---|
protected MSExtractor()
Method Detail |
---|
protected void extract(InputStream input) throws Exception
Exception
protected abstract String extractText(InputStream input) throws Exception
Exception
protected String getText()
protected Properties getProperties()
Properties
of the Microsoft document.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |