|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
See:
Description
Plugins API | |
---|---|
org.apache.nutch.protocol.http.api | Common API used by HTTP plugins (http ,
httpclient ) |
org.apache.nutch.urlfilter.api |
Protocol Plugins | |
---|---|
org.apache.nutch.protocol.file | Protocol plugin which supports retrieving local file resources. |
org.apache.nutch.protocol.ftp | Protocol plugin which supports retrieving documents via the ftp protocol. |
org.apache.nutch.protocol.http | Protocol plugin which supports retrieving documents via the http protocol. |
org.apache.nutch.protocol.httpclient | Protocol plugin which supports retrieving documents via the HTTP and HTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server. |
URL Filter Plugins | |
---|---|
org.apache.nutch.net.urlnormalizer.basic | |
org.apache.nutch.net.urlnormalizer.pass | |
org.apache.nutch.net.urlnormalizer.regex |
Scoring Plugins | |
---|---|
org.apache.nutch.scoring.link | |
org.apache.nutch.scoring.opic | |
org.apache.nutch.scoring.tld | Top Level Domain Scoring plugin. |
Parse Plugins | |
---|---|
org.apache.nutch.parse.ext | |
org.apache.nutch.parse.feed | |
org.apache.nutch.parse.html | An HTML document parsing plugin. |
org.apache.nutch.parse.js | A parser plugin and content filter to extract all (possible) links from JavaScript files and code snippets. |
org.apache.nutch.parse.swf | |
org.apache.nutch.parse.tika | |
org.apache.nutch.parse.zip |
Indexing Filter Plugins | |
---|---|
org.apache.nutch.indexer.anchor | An indexing plugin for inbound anchor text. |
org.apache.nutch.indexer.basic | A basic indexing plugin. |
org.apache.nutch.indexer.feed | |
org.apache.nutch.indexer.more | A more indexing plugin. |
org.apache.nutch.indexer.subcollection | |
org.apache.nutch.indexer.tld | Top Level Domain Indexing plugin. |
Misc. Plugins | |
---|---|
org.apache.nutch.analysis.lang | Text document language identifier. |
org.apache.nutch.collection | Subcollection is a subset of an index. |
org.apache.nutch.microformats.reltag | A microformats Rel-Tag Parser/Indexer/Querier plugin. |
org.creativecommons.nutch | Sample plugins that parse and index Creative Commons medadata. |
Apache Nutch 2.X is a branch of the Apache Nutch open source web-search software project. It builds on Apache Gora for data persistence and Apache Solr for indexing adding web-specifics, such as a crawler, a link-graph database and parsing support handled by Apache Tika for HTML and an array other document formats.
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |