Nutch 1.1 API

Nutch is the open-source search engine.

See:
          Description

Core
org.apache.nutch.analysis Tokenizer for documents and query parser.
org.apache.nutch.clustering  
org.apache.nutch.crawl Crawl control code.
org.apache.nutch.fetcher The Nutch robot.
org.apache.nutch.html  
org.apache.nutch.indexer Maintain Lucene full-text indexes.
org.apache.nutch.indexer.field  
org.apache.nutch.indexer.lucene  
org.apache.nutch.indexer.solr  
org.apache.nutch.metadata A Multi-valued Metadata container, and set of constant fields for Nutch Metadata.
org.apache.nutch.net  
org.apache.nutch.net.protocols  
org.apache.nutch.ontology  
org.apache.nutch.parse  
org.apache.nutch.plugin The Nutch Plugin System.
org.apache.nutch.protocol  
org.apache.nutch.scoring  
org.apache.nutch.scoring.webgraph  
org.apache.nutch.searcher Search API
org.apache.nutch.searcher.response  
org.apache.nutch.segment  
org.apache.nutch.servlet  
org.apache.nutch.tools  
org.apache.nutch.tools.arc  
org.apache.nutch.tools.compat  
org.apache.nutch.util  
org.apache.nutch.util.domain org.apache.nutch.util.domain

 

Plugins API
org.apache.nutch.parse.ms Common API for Microsoft © documents parsing.
org.apache.nutch.protocol.http.api Common API used by HTTP plugins (http, httpclient)
org.apache.nutch.urlfilter.api  

 

Protocol Plugins
org.apache.nutch.protocol.file Protocol plugin which supports retrieving local file resources.
org.apache.nutch.protocol.ftp Protocol plugin which supports retrieving documents via the ftp protocol.
org.apache.nutch.protocol.http Protocol plugin which supports retrieving documents via the http protocol.
org.apache.nutch.protocol.httpclient Protocol plugin which supports retrieving documents via the HTTP and HTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server.

 

URL Filter Plugins
org.apache.nutch.urlfilter.automaton A url filter plugin based on dk.brics.automaton Finite-State Automata for JavaTM.
org.apache.nutch.urlfilter.prefix A url filter plugin.
org.apache.nutch.urlfilter.regex A url filter plugin.

 

Scoring Plugins
org.apache.nutch.scoring.opic  

 

Parse Plugins
org.apache.nutch.parse.ext  
org.apache.nutch.parse.html An HTML document parsing plugin.
org.apache.nutch.parse.js  
org.apache.nutch.parse.msexcel A Microsoft © Excel document parsing plugin.
org.apache.nutch.parse.mspowerpoint A Microsoft © PowerPoint document parsing plugin.
org.apache.nutch.parse.msword A Microsoft © Word document parsing plugin.
org.apache.nutch.parse.msword.chp  
org.apache.nutch.parse.oo  
org.apache.nutch.parse.pdf A pdf parsing plugin.
org.apache.nutch.parse.rss  
org.apache.nutch.parse.rss.structs  
org.apache.nutch.parse.swf  
org.apache.nutch.parse.text A plain text parsing plugin.
org.apache.nutch.parse.zip  

 

Indexing Filter Plugins
org.apache.nutch.indexer.basic A basic indexing plugin.
org.apache.nutch.indexer.more A more indexing plugin.

 

Query Filter Plugins
org.apache.nutch.searcher.basic  
org.apache.nutch.searcher.more A more query plugin.
org.apache.nutch.searcher.site  
org.apache.nutch.searcher.url  

 

Summary Plugins
org.apache.nutch.summary.basic A basic summarizer implementation.
org.apache.nutch.summary.lucene A Lucene Highlighter based summarizer implementation.

 

Clustering Plugins
org.apache.nutch.clustering.carrot2  

 

Ontology Plugins
org.apache.nutch.ontology.jena  

 

Misc. Plugins
org.apache.nutch.analysis.lang Text document language identifier.
org.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin.
org.creativecommons.nutch Sample plugins that parse and index Creative Commons medadata.

 

Nutch is the open-source search engine.



Copyright © 2006 The Apache Software Foundation