Modifier and Type | Class and Description |
---|---|
class |
HTMLLanguageParser |
class |
LanguageIndexingFilter
An
IndexingFilter that
add a lang (language) field to the document. |
Modifier and Type | Class and Description |
---|---|
class |
Subcollection
SubCollection represents a subset of index, you can define url patterns that
will indicate that particular page (url) is part of SubCollection.
|
Modifier and Type | Interface and Description |
---|---|
interface |
IndexingFilter
Extension point for indexing.
|
interface |
IndexWriter |
Modifier and Type | Class and Description |
---|---|
class |
AnchorIndexingFilter
Indexing filter that offers an option to either index all inbound anchor text for
a document or deduplicate anchors.
|
Modifier and Type | Class and Description |
---|---|
class |
BasicIndexingFilter
Adds basic searchable fields to a document.
|
Modifier and Type | Class and Description |
---|---|
class |
FeedIndexingFilter |
Modifier and Type | Class and Description |
---|---|
class |
MetadataIndexer
Indexer which can be configured to extract metadata from the crawldb, parse metadata or content metadata.
|
Modifier and Type | Class and Description |
---|---|
class |
MoreIndexingFilter
Add (or reset) a few metaData properties as respective fields (if they are
available), so that they can be accurately used within the search index.
|
Modifier and Type | Class and Description |
---|---|
class |
StaticFieldIndexer
A simple plugin called at indexing that adds fields with static data.
|
Modifier and Type | Class and Description |
---|---|
class |
SubcollectionIndexingFilter |
Modifier and Type | Class and Description |
---|---|
class |
TLDIndexingFilter
Adds the Top level domain extensions to the index
|
Modifier and Type | Class and Description |
---|---|
class |
URLMetaIndexingFilter
This is part of the URL Meta plugin.
|
Modifier and Type | Class and Description |
---|---|
class |
SolrIndexWriter |
Modifier and Type | Class and Description |
---|---|
class |
RelTagIndexingFilter
An
IndexingFilter that
add tag field(s) to the document. |
class |
RelTagParser
Adds microformat rel-tags of document if found.
|
Modifier and Type | Interface and Description |
---|---|
interface |
URLFilter
Interface used to limit which URLs enter Nutch.
|
Modifier and Type | Interface and Description |
---|---|
interface |
HtmlParseFilter
Extension point for DOM-based HTML parsers.
|
interface |
Parser
A parser for content generated by a
Protocol
implementation. |
Modifier and Type | Class and Description |
---|---|
class |
MetaTagsParser
Parse HTML meta tags (keywords, description) and store them in the parse metadata so that
they can be indexed with the index-metadata plugin with the prefix 'metatag.'
|
Modifier and Type | Class and Description |
---|---|
class |
ExtParser
A wrapper that invokes external command to do real parsing job.
|
Modifier and Type | Class and Description |
---|---|
class |
FeedParser |
Modifier and Type | Class and Description |
---|---|
class |
HeadingsParseFilter
HtmlParseFilter to retrieve h1 and h2 values from the DOM.
|
Modifier and Type | Class and Description |
---|---|
class |
HtmlParser |
Modifier and Type | Class and Description |
---|---|
class |
JSParseFilter
This class is a heuristic link extractor for JavaScript files and
code snippets.
|
Modifier and Type | Class and Description |
---|---|
class |
SWFParser
Parser for Flash SWF files.
|
Modifier and Type | Class and Description |
---|---|
class |
TikaParser
Wrapper for Tika parsers.
|
Modifier and Type | Class and Description |
---|---|
class |
ZipParser
ZipParser class based on MSPowerPointParser class by Stephan Strittmatter.
|
Modifier and Type | Interface and Description |
---|---|
interface |
Protocol
A retriever of url content.
|
Modifier and Type | Class and Description |
---|---|
class |
File
This class is a protocol plugin used for file: scheme.
|
Modifier and Type | Class and Description |
---|---|
class |
Ftp
This class is a protocol plugin used for ftp: scheme.
|
Modifier and Type | Class and Description |
---|---|
class |
Http |
Modifier and Type | Class and Description |
---|---|
class |
HttpBase |
Modifier and Type | Interface and Description |
---|---|
interface |
ScoringFilter
A contract defining behavior of scoring plugins.
|
Modifier and Type | Class and Description |
---|---|
class |
AbstractScoringFilter |
class |
ScoringFilters
Creates and caches
ScoringFilter implementing plugins. |
Modifier and Type | Class and Description |
---|---|
class |
LinkAnalysisScoringFilter |
Modifier and Type | Class and Description |
---|---|
class |
OPICScoringFilter
|
Modifier and Type | Class and Description |
---|---|
class |
TLDScoringFilter
Scoring filter to boost tlds.
|
Modifier and Type | Class and Description |
---|---|
class |
URLMetaScoringFilter
For documentation:
|
Modifier and Type | Class and Description |
---|---|
class |
RegexURLFilterBase
Generic
URL filter based on
regular expressions. |
Modifier and Type | Class and Description |
---|---|
class |
AutomatonURLFilter
RegexURLFilterBase implementation based on the
dk.brics.automaton
Finite-State Automata for JavaTM.
|
Modifier and Type | Class and Description |
---|---|
class |
DomainURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and
hostnames.
|
Modifier and Type | Class and Description |
---|---|
class |
DomainBlacklistURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and
hostnames.
|
Modifier and Type | Class and Description |
---|---|
class |
PrefixURLFilter
Filters URLs based on a file of URL prefixes.
|
Modifier and Type | Class and Description |
---|---|
class |
RegexURLFilter
Filters URLs based on a file of regular expressions using the
Java Regex implementation . |
Modifier and Type | Class and Description |
---|---|
class |
SuffixURLFilter
Filters URLs based on a file of URL suffixes.
|
Modifier and Type | Class and Description |
---|---|
class |
UrlValidator
Validates URLs.
|
Modifier and Type | Class and Description |
---|---|
class |
CCIndexingFilter
Adds basic searchable fields to a document.
|
class |
CCParseFilter
Adds metadata identifying the Creative Commons license used, if any.
|
Copyright © 2014 The Apache Software Foundation