Spark App Development (Scala)

The spark app development process is pretty interesting.

I am jotting down some notes as I make progress in the process.

Notes:

  • The easiest way is to develop your app in intellijIDEA and run it either from (a) intellij IDEA itself or  (b) from the sbt console using ‘run’
    • for this, we must necessarily  have the ‘master’ URL set to ‘local’
    • else you will get an error : “org.apache.spark.SparkException: A master URL must be set in your configuration”
  • sbt package
    • the official spark quick start guide actually has an example of this, whereby you don’ t have to set the ‘master’ URL in the app itself.
    • instead, you specify the master when doing spark submit
    • Note:
      • if you have master set in code, then –master in spark-submit doesn’t take effect.
    •  Example:
      • sbt assembly
      • spark-submit –master spark://spark-host:7077 target/scala-2.11/HelloRedmondApp-assembly-1.0.jar
  • sbt assembly
    • this is similar to the workflow for ‘sbt package’
    • in the build.sbt file,  there is a keyword  “Provided”  which has ramifications when one uses ‘sbt assembly’.
      • //libraryDependencies += “org.apache.spark” %% “spark-sql” % “2.0.0” % Provided
        //libraryDependencies += “org.apache.spark” %% “spark-mllib” % “2.0.0” % Provided
    • I need to follow up more on this Provided keyword…

Exercise:

To get additional insights this exercise was very useful.

  • Step 1: Run Spark Standalone.
    • From tools/spark and run ./sbin/start-master.sh
    • Run ./sbin/start-slave.sh spark://spark-host:7077
    • At this point you should have Spark Standalone up and running
    • Open Standalone’s Web UI available at http://localhost:8080. Confirm you have got a node connected in Workers section at the top
  • Step 2: Start spark-shell  (which is also a scala app itself) and attach it to the master.
    • spark-shell –master spark://spark-host:7077
  • Step 3: Actually submit the application to the same cluster
    • sbt assembly
    • spark-submit –master spark://spark-host:7077 target/scala-2.11/HelloRedmondApp-assembly-1.0.jar
    • Note how the app is now in WAITING state. Its waiting because the cores have actually been allocated to the spark-shell from Step 2.
  • Step 4: Kill the spark-shell that was started in Step 2. You will notice that the WAITING app now starts RUNNING.
    • this is because it now has the resources to run.

 

capture

 

Code:

scalatrialapps

Advertisements

One thought on “Spark App Development (Scala)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s