Package | Description |
---|---|
org.apache.nutch.analysis.lang |
Text document language identifier.
|
org.apache.nutch.microformats.reltag |
A microformats Rel-Tag
Parser/Indexer/Querier plugin.
|
org.apache.nutch.parse | |
org.apache.nutch.parse.headings | |
org.apache.nutch.parse.js | |
org.creativecommons.nutch |
Sample plugins that parse and index Creative Commons medadata.
|
Modifier and Type | Class and Description |
---|---|
class |
HTMLLanguageParser |
Modifier and Type | Class and Description |
---|---|
class |
RelTagParser
Adds microformat rel-tags of document if found.
|
Modifier and Type | Class and Description |
---|---|
class |
MetaTagsParser
Parse HTML meta tags (keywords, description) and store them in the parse metadata so that
they can be indexed with the index-metadata plugin with the prefix 'metatag.'
|
Modifier and Type | Class and Description |
---|---|
class |
HeadingsParseFilter
HtmlParseFilter to retrieve h1 and h2 values from the DOM.
|
Modifier and Type | Class and Description |
---|---|
class |
JSParseFilter
This class is a heuristic link extractor for JavaScript files and
code snippets.
|
Modifier and Type | Class and Description |
---|---|
class |
CCParseFilter
Adds metadata identifying the Creative Commons license used, if any.
|
Copyright © 2014 The Apache Software Foundation