Intro

Many of the Mahout libraries run as batch jobs, dumping results into Hadoop sequence files or other data structures. This page is intended to demonstrate the various ways one might inspect the outcome of various jobs. The page is organized by algorithms.

General Utilities

Sequence File Dumper

Clustering

Cluster Dumper

Run the following to print out all options:

java  -cp "*" org.apache.mahout.utils.clustering.ClusterDumper --help

Example

java  -cp "*" org.apache.mahout.utils.clustering.ClusterDumper --seqFileDir ./solr-clust-n2/out/clusters-2
      --dictionary ./solr-clust-n2/dictionary.txt
      --substring 100 --pointsDir ./solr-clust-n2/out/points/

Twitter

Apache Software Foundation

Related Projects