org.apache.nutch.parse.ms
Class MSBaseParser

java.lang.Object
  extended by org.apache.nutch.parse.ms.MSBaseParser
All Implemented Interfaces:
Configurable, Parser, Pluggable
Direct Known Subclasses:
MSExcelParser, MSPowerPointParser, MSWordParser

public abstract class MSBaseParser
extends Object
implements Parser

A generic Microsoft document parser.

Author:
Jérôme Charron

Field Summary
protected static org.apache.commons.logging.Log LOG
           
 
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
 
Constructor Summary
MSBaseParser()
           
 
Method Summary
 Configuration getConf()
           
protected  ParseResult getParse(MSExtractor extractor, Content content)
          Parses a Content with a specific Microsoft document extractor.
static void main(String mime, MSBaseParser parser, String[] args)
          Main for testing.
 void setConf(Configuration conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.nutch.parse.Parser
getParse
 

Field Detail

LOG

protected static final org.apache.commons.logging.Log LOG
Constructor Detail

MSBaseParser

public MSBaseParser()
Method Detail

getParse

protected ParseResult getParse(MSExtractor extractor,
                               Content content)
Parses a Content with a specific Microsoft document extractor.


main

public static void main(String mime,
                        MSBaseParser parser,
                        String[] args)
Main for testing. Pass a ms document as argument


setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable


Copyright © 2006 The Apache Software Foundation