Apache Nutch 1.3 API

Nutch is the open-source search engine.

See:
          Description

Core
org.apache.nutch.analysis.lang Text document language identifier.
org.apache.nutch.crawl Crawl control code.
org.apache.nutch.fetcher The Nutch robot.
org.apache.nutch.indexer Maintain Lucene full-text indexes.
org.apache.nutch.indexer.solr  
org.apache.nutch.metadata A Multi-valued Metadata container, and set of constant fields for Nutch Metadata.
org.apache.nutch.net  
org.apache.nutch.net.protocols  
org.apache.nutch.parse  
org.apache.nutch.plugin The Nutch Plugin System.
org.apache.nutch.protocol  
org.apache.nutch.scoring  
org.apache.nutch.scoring.webgraph  
org.apache.nutch.segment  
org.apache.nutch.tools  
org.apache.nutch.tools.arc  
org.apache.nutch.tools.proxy  
org.apache.nutch.util  
org.apache.nutch.util.domain org.apache.nutch.util.domain

 

Plugins API
org.apache.nutch.protocol.http.api Common API used by HTTP plugins (http, httpclient)
org.apache.nutch.urlfilter.api  

 

Protocol Plugins
org.apache.nutch.protocol.file Protocol plugin which supports retrieving local file resources.
org.apache.nutch.protocol.ftp Protocol plugin which supports retrieving documents via the ftp protocol.
org.apache.nutch.protocol.http Protocol plugin which supports retrieving documents via the http protocol.
org.apache.nutch.protocol.httpclient Protocol plugin which supports retrieving documents via the HTTP and HTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server.

 

URL Filter Plugins
org.apache.nutch.urlfilter.automaton A url filter plugin based on dk.brics.automaton Finite-State Automata for JavaTM.
org.apache.nutch.urlfilter.prefix A url filter plugin.
org.apache.nutch.urlfilter.regex A url filter plugin.

 

Scoring Plugins
org.apache.nutch.scoring.opic  

 

Parse Plugins
org.apache.nutch.parse.ext  
org.apache.nutch.parse.js  
org.apache.nutch.parse.swf  
org.apache.nutch.parse.tika  
org.apache.nutch.parse.zip  

 

Indexing Filter Plugins
org.apache.nutch.indexer.basic A basic indexing plugin.
org.apache.nutch.indexer.more A more indexing plugin.

 

Misc. Plugins
org.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin.
org.creativecommons.nutch Sample plugins that parse and index Creative Commons medadata.

 

Nutch is the open-source search engine.



Copyright © 2011 The Apache Software Foundation