Apache Mahout > Mahout Wiki > Quickstart > ClusteringYourData > mean-shift-commandline |
Mahout's Mean Shift clustering can be launched from the same command line invocation whether you are running on a single machine in stand-alone mode or on a larger Hadoop cluster. The difference is determined by the $HADOOP_HOME and $HADOOP_CONF_DIR environment variables. If both are set to an operating Hadoop cluster on the target machine then the invocation will run Mean Shift on that cluster. If either of the environment variables are missing then the stand-alone Hadoop configuration will be invoked instead.
./bin/mahout meanshift <OPTIONS>
./bin/mahout meanshift -i testdata <OTHER OPTIONS>
export HADOOP_HOME=<Hadoop Home Directory> export HADOOP_CONF_DIR=$HADOOP_HOME/conf ./bin/mahout meanshift -i testdata <OTHER OPTIONS>
--input (-i) input Path to job input directory.
Must be a SequenceFile of
VectorWritable
--output (-o) output The directory pathname for
output.
--overwrite (-ow) If present, overwrite the output
directory before running job
--distanceMeasure (-dm) distanceMeasure The classname of the
DistanceMeasure. Default is
SquaredEuclidean
--help (-h) Print out help
--convergenceDelta (-cd) convergenceDelta The convergence delta value.
Default is 0.5
--t1 (-t1) t1 T1 threshold value
--t2 (-t2) t2 T2 threshold value
--clustering (-cl) If present, run clustering after
the iterations have taken place
--maxIter (-x) maxIter The maximum number of
iterations.
--inputIsCanopies (-ic) inputIsCanopies If present, the input directory
already contains
MeanShiftCanopies