org.apache.nutch.parse.oo
Class OOParser

java.lang.Object
  extended by org.apache.nutch.parse.oo.OOParser
All Implemented Interfaces:
Configurable, Parser, Pluggable

public class OOParser
extends Object
implements Parser

Parser for OpenOffice and OpenDocument formats. This should handle the following formats: Text, Spreadsheet, Presentation, and corresponding templates and "master" documents.

Author:
Andrzej Bialecki

Field Summary
static org.apache.commons.logging.Log LOG
           
 
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
 
Constructor Summary
OOParser()
           
 
Method Summary
 Configuration getConf()
           
 ParseResult getParse(Content content)
           This method parses the given content and returns a map of <key, parse> pairs.
static void main(String[] args)
           
 void setConf(Configuration conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

OOParser

public OOParser()
Method Detail

setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable

getParse

public ParseResult getParse(Content content)
Description copied from interface: Parser

This method parses the given content and returns a map of <key, parse> pairs. Parse instances will be persisted under the given key.

Note: Meta-redirects should be followed only when they are coming from the original URL. That is:
Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html", Parse with a ParseStatus indicating the redirect>.

Specified by:
getParse in interface Parser
Parameters:
content - Content to be parsed
Returns:
a map containing <key, parse> pairs

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2006 The Apache Software Foundation