Title: bayesian-commandline # Introduction This quick start page describes how to run the naive bayesian and complementary naive bayesian classification algorithms on a Hadoop cluster. # Steps ## Testing it on one single machine w/o cluster In the examples directory type: mvn -q exec:java -Dexec.mainClass="org.apache.mahout.classifier.bayes.mapreduce.bayes." -Dexec.args="" mvn -q exec:java -Dexec.mainClass="org.apache.mahout.classifier.bayes.mapreduce.cbayes." -Dexec.args="" ## Running it on the cluster * In $MAHOUT_HOME/, build the jar containing the job (mvn install) The job will be generated in $MAHOUT_HOME/core/target/ and it's name will contain the Mahout version number. For example, when using Mahout 0.1 release, the job will be mahout-core-0.1.jar * (Optional) 1 Start up Hadoop: $HADOOP_HOME/bin/start-all.sh * Put the data: $HADOOP_HOME/bin/hadoop fs -put testdata * Run the Job: $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core-.job org.apache.mahout.classifier.bayes.mapreduce.bayes.BayesDriver * Get the data out of HDFS and have a look. Use bin/hadoop fs -lsr output to view all outputs. # Command line options BayesDriver, BayesThetaNormalizerDriver, CBayesNormalizedWeightDriver, CBayesDriver, CBayesThetaDriver, CBayesThetaNormalizerDriver, BayesWeightSummerDriver, BayesFeatureDriver, BayesTfIdfDriver Usage: [--input --output --help] Options --input (-i) input The Path for input Vectors. Must be a SequenceFile of Writable, Vector. --output (-o) output The directory pathname for output points. --help (-h) Print out help.