Interface | Description |
---|---|
CommonCrawlFormat |
Interface for all CommonCrawl formatter.
|
Class | Description |
---|---|
AbstractCommonCrawlFormat |
Abstract class that implements
CommonCrawlFormat interface. |
Benchmark | |
Benchmark.BenchmarkResults | |
CommonCrawlConfig | |
CommonCrawlDataDumper |
The Common Crawl Data Dumper tool enables one to reverse generate the raw
content from Nutch segment data directories into a common crawling data
format, consumed by many applications.
|
CommonCrawlFormatFactory |
Factory class that creates new
CommonCrawlFormat objects (a.k.a. |
CommonCrawlFormatJackson |
This class provides methods to map crawled data on JSON using Jackson Streaming APIs.
|
CommonCrawlFormatJettinson |
This class provides methods to map crawled data on JSON using Jettinson APIs.
|
CommonCrawlFormatSimple |
This class provides methods to map crawled data on JSON using a
StringBuilder object. |
DmozParser |
Utility that converts DMOZ RDF into a flat file of URLs to be injected.
|
FileDumper |
The file dumper tool enables one to reverse generate the raw content from
Nutch segment data directories.
|
FreeGenerator |
This tool generates fetchlists (segments to be fetched) from plain text files
containing one URL per line.
|
FreeGenerator.FG | |
ResolveUrls |
A simple tool that will spin up multiple threads to resolve urls to ip
addresses.
|
Copyright © 2015 The Apache Software Foundation