Package | Description |
---|---|
org.apache.nutch.analysis.lang |
Text document language identifier.
|
org.apache.nutch.microformats.reltag |
A microformats Rel-Tag
Parser/Indexer/Querier plugin.
|
org.apache.nutch.parse | |
org.apache.nutch.parse.headings | |
org.apache.nutch.parse.html |
An HTML document parsing plugin.
|
org.apache.nutch.parse.js | |
org.apache.nutch.parse.tika | |
org.creativecommons.nutch |
Sample plugins that parse and index Creative Commons medadata.
|
Modifier and Type | Method and Description |
---|---|
ParseResult |
HTMLLanguageParser.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
Scan the HTML document looking at possible indications of content
language
1. |
Modifier and Type | Method and Description |
---|---|
ParseResult |
RelTagParser.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
Scan the HTML document looking at possible rel-tags
|
Modifier and Type | Method and Description |
---|---|
ParseResult |
MetaTagsParser.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc) |
ParseResult |
HtmlParseFilters.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
Run all defined filters.
|
ParseResult |
HtmlParseFilter.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of HTML content, given
the DOM tree of a page.
|
Modifier and Type | Method and Description |
---|---|
ParseResult |
HeadingsParseFilter.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc) |
Modifier and Type | Method and Description |
---|---|
static void |
HTMLMetaProcessor.getMetaTags(HTMLMetaTags metaTags,
Node node,
URL currURL)
Sets the indicators in
robotsMeta to appropriate
values, based on any META tags found under the given
node . |
Modifier and Type | Method and Description |
---|---|
ParseResult |
JSParseFilter.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc) |
Modifier and Type | Method and Description |
---|---|
static void |
HTMLMetaProcessor.getMetaTags(HTMLMetaTags metaTags,
Node node,
URL currURL)
Sets the indicators in
robotsMeta to appropriate
values, based on any META tags found under the given
node . |
Modifier and Type | Method and Description |
---|---|
ParseResult |
CCParseFilter.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given
the DOM tree of a page.
|
Copyright © 2014 The Apache Software Foundation