org.apache.nutch.parse.text
Class TextParser

java.lang.Object
  extended by org.apache.nutch.parse.text.TextParser
All Implemented Interfaces:
Configurable, Parser, Pluggable

public class TextParser
extends Object
implements Parser


Field Summary
 
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
 
Constructor Summary
TextParser()
           
 
Method Summary
 Configuration getConf()
           
 ParseResult getParse(Content content)
          Parses plain text document.
 void setConf(Configuration conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextParser

public TextParser()
Method Detail

getParse

public ParseResult getParse(Content content)
Parses plain text document. This code uses configured default encoding parser.character.encoding.default if character set isn't specified as HTTP header.

Specified by:
getParse in interface Parser
Parameters:
content - Content to be parsed
Returns:
a map containing <key, parse> pairs

setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable


Copyright © 2006 The Apache Software Foundation