Q: Mahout Spark shell doesn’t start; “ClassNotFound” problems or various classpath problems.
A: So far as of the time of this writing all reported problems starting the Spark shell in Mahout were revolving around classpath issues one way or another.
If you are getting method signature like errors, most probably you have mismatch between Mahout’s Spark dependency and actual Spark installed. (At the time of this writing the HEAD depends on Spark 1.1.0) but check mahout/pom.xml.
Troubleshooting general classpath issues is pretty straightforward. Since Mahout is using Spark’s installation and its classpath as reported by Spark itself for Spark-related dependencies, it is important to make sure the classpath is sane and is made available to Mahout:
$SPARK_HOME/bin/compute-classpath.sh
and make sure it produces sane result with no errors.
If it outputs something other than a straightforward classpath string, most likely Spark is not compiled/set correctly (later spark versions require
sbt/sbt assembly
to be run, simply runnig sbt/sbt publish-local
is not enough any longer).$MAHOUT_HOME/bin/mahout -spark classpath
and check that path reported in step (3) is included.Q: I am using the command line Mahout jobs that run on Spark or am writing my own application that uses Mahout’s Spark code. When I run the code on my cluster I get ClassNotFound or signature errors during serialization. What’s wrong?
A: The Spark artifacts in the maven ecosystem may not match the exact binary you are running on your cluster. This may cause class name or version mismatches. In this case you may wish to build Spark yourself to guarantee that you are running exactly what you are building Mahout against. To do this follow these steps in order:
Q: The implicit SparkContext ‘sc’ does not work in the Mahout spark-shell.
A: In the Mahout spark-shell the SparkContext is called ‘sdc’, where the ‘d’ stands for distributed.