I was playing around with Mahout, and one of the things I wanted to try out was to use Mahout’s Spark Shell on my local machine
There is a nice example for doing this. But I hit a stack dump the moment I tried to start up the mahout shell using bin/mahout spark-shell
<br />java.lang.RuntimeException: java.io.InvalidClassException: org.apache.spark.rpc.netty.RequestMessage; local class incompatible: stream classdesc serialVersionUID = -2221986757032131007, local class serialVersionUID = -5447855329526097695
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
The problem is because the spark version that Mahout was looking for was 1.6.2 (specified in the POM file). The spark cluster I had started up was with the latest version 2.0.1
Here are the steps I did to get it going:
Installing Mahout & Spark on your local machine
- Create a directory for Mahout somewhere on your machine, change to there and checkout the master branch of Apache Mahout from GitHub :
git clone https://github.com/apache/mahout mahout
- Look at the POM file to check for the spark version dependency
- Change to the
mahout
directory and build mahout using mvn -DskipTests clean install
- Download Apache Spark (http://www.apache.org/dyn/closer.cgi/spark)
- Note: Download the source code not just the pre-built binaries.
- Select ‘Source Code’ in the Project type
- Change to the directory where you unpacked Spark and type `
sbt/sbt assembly`
to build it
Starting Mahout’s Spark shell
- Goto the directory where you unpacked Spark and type `
sbin/start-all.sh`
to locally start Spark
- Open a browser, point it to http://localhost:8080/ to check whether Spark successfully started. Copy the url of the spark master at the top of the page (it starts with spark://)
- This starts spark in the Standalone mode with 1 master and 1 worker
- Verified the spark version used was 1.6.2
- Define the following environment variables in a file `mymahoutsparksettings.sh` and source that file so the following variables are set
<br />abgoswam@abgoswam-ubuntu:~/repos/mahout$ cat mymahoutsparksettings.sh
#!/usr/bin/env bash
export MAHOUT_HOME=/home/abgoswam/repos/mahout
export SPARK_HOME=/home/abgoswam/packages/spark-1.6.2
export MASTER=spark://abgoswam-ubuntu:7077
echo "Set variables for Mahout"
abgoswam@abgoswam-ubuntu:~/repos/mahout$
- Finally, change to the directory where you unpacked Mahout and type `
bin/mahout spark-shell`
, you should see the shell starting and get the prompt mahout>
.
References: