org.apache.nutch.parse.ms
Class MSBaseParser
java.lang.Object
org.apache.nutch.parse.ms.MSBaseParser
- All Implemented Interfaces:
- Configurable, Parser, Pluggable
- Direct Known Subclasses:
- MSExcelParser, MSPowerPointParser, MSWordParser
public abstract class MSBaseParser
- extends Object
- implements Parser
A generic Microsoft document parser.
- Author:
- Jérôme Charron
Field Summary |
protected static org.apache.commons.logging.Log |
LOG
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.nutch.parse.Parser |
getParse |
LOG
protected static final org.apache.commons.logging.Log LOG
MSBaseParser
public MSBaseParser()
getParse
protected ParseResult getParse(MSExtractor extractor,
Content content)
- Parses a Content with a specific
Microsoft document
extractor
.
main
public static void main(String mime,
MSBaseParser parser,
String[] args)
- Main for testing. Pass a ms document as argument
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interface Configurable
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
Copyright © 2006 The Apache Software Foundation