Package org.apache.nutch.tools

Interface Summary
PruneIndexTool.PruneChecker This interface can be used to implement additional checking on matching documents.
 

Class Summary
CrawlDBScanner Dumps all the entries matching a regular expression on their URL.
DmozParser Utility that converts DMOZ RDF into a flat file of URLs to be injected.
FreeGenerator This tool generates fetchlists (segments to be fetched) from plain text files containing one URL per line.
FreeGenerator.FG  
PruneIndexTool This tool prunes existing Nutch indexes of unwanted content.
PruneIndexTool.PrintFieldsChecker This checker's main function is just to print out selected field values from each document, just before they are deleted.
PruneIndexTool.StoreUrlsChecker This checker's main function is just to store the URLs of each document to be deleted in a text file.
ResolveUrls A simple tool that will spin up multiple threads to resolve urls to ip addresses.
SearchLoadTester A simple tool to perform load testing on configured search servers.
 



Copyright © 2006 The Apache Software Foundation