public interface Parser extends Pluggable, Configurable
Protocol
implementation. This interface is
implemented by extensions. Nutch's core contains no page parsing code.Modifier and Type | Field and Description |
---|---|
static String |
X_POINT_ID
The name of the extension point.
|
Modifier and Type | Method and Description |
---|---|
ParseResult |
getParse(Content c)
This method parses the given content and returns a map of <key,
parse> pairs.
|
getConf, setConf
static final String X_POINT_ID
ParseResult getParse(Content c)
This method parses the given content and returns a map of <key,
parse> pairs. Parse
instances will be persisted under the given
key.
Note: Meta-redirects should be followed only when they are coming from the
original URL. That is:
Assume fetcher is in parsing mode and is currently processing
foo.bar.com/redirect.html. If this url contains a meta redirect to another
url, fetcher should only follow the redirect if the map contains an entry
of the form <"foo.bar.com/redirect.html", Parse
with a
ParseStatus
indicating the redirect>.
c
- Content to be parsedCopyright © 2015 The Apache Software Foundation