Getting Started With Scala

The other day I was discussing about how to have a gentle introduction to the Scala language. Here is what I prescribe ..

Sufficient Scala:

Additional Resources:

References:

Advertisements

Java, Maven, Scala, SBT Concepts

Am toe dipping into maven. Trying to make sense of how maven fits in with  IDE, command line maven, POM files blah blah etc.

tips:

  • intellij will have default support within the IDE for both Maven and SBT.  So as long as we are not using mvn and sbt  from the command line we should be good.

java fundamental concepts:

maven:

base scala in intellij:

scala with SBT:

Spark App Development (Scala)

The spark app development process is pretty interesting.

I am jotting down some notes as I make progress in the process.

Notes:

  • The easiest way is to develop your app in intellijIDEA and run it either from (a) intellij IDEA itself or  (b) from the sbt console using ‘run’
    • for this, we must necessarily  have the ‘master’ URL set to ‘local’
    • else you will get an error : “org.apache.spark.SparkException: A master URL must be set in your configuration”
  • sbt package
    • the official spark quick start guide actually has an example of this, whereby you don’ t have to set the ‘master’ URL in the app itself.
    • instead, you specify the master when doing spark submit
    • Note:
      • if you have master set in code, then –master in spark-submit doesn’t take effect.
    •  Example:
      • sbt assembly
      • spark-submit –master spark://spark-host:7077 target/scala-2.11/HelloRedmondApp-assembly-1.0.jar
  • sbt assembly
    • this is similar to the workflow for ‘sbt package’
    • in the build.sbt file,  there is a keyword  “Provided”  which has ramifications when one uses ‘sbt assembly’.
      • //libraryDependencies += “org.apache.spark” %% “spark-sql” % “2.0.0” % Provided
        //libraryDependencies += “org.apache.spark” %% “spark-mllib” % “2.0.0” % Provided
    • I need to follow up more on this Provided keyword…

Exercise:

To get additional insights this exercise was very useful.

  • Step 1: Run Spark Standalone.
    • From tools/spark and run ./sbin/start-master.sh
    • Run ./sbin/start-slave.sh spark://spark-host:7077
    • At this point you should have Spark Standalone up and running
    • Open Standalone’s Web UI available at http://localhost:8080. Confirm you have got a node connected in Workers section at the top
  • Step 2: Start spark-shell  (which is also a scala app itself) and attach it to the master.
    • spark-shell –master spark://spark-host:7077
  • Step 3: Actually submit the application to the same cluster
    • sbt assembly
    • spark-submit –master spark://spark-host:7077 target/scala-2.11/HelloRedmondApp-assembly-1.0.jar
    • Note how the app is now in WAITING state. Its waiting because the cores have actually been allocated to the spark-shell from Step 2.
  • Step 4: Kill the spark-shell that was started in Step 2. You will notice that the WAITING app now starts RUNNING.
    • this is because it now has the resources to run.

 

capture

 

Code:

scalatrialapps

Running Tests in Scala IntelliJ IDEA

So I set up my first test suite in IntelliJ IDEA to test out some Scala code.

To run unit tests, I used the FunSuite package in ScalaTest.

import org.scalatest.FunSuite

 

Tip:

[1]  I modified the build.sbt to include the line:

libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.0" % "test"

I reloaded the project. However, for the changes to take effect I had to terminate the running sbt prompt and open a new one.

 

[2]  I did not use junit

 

References: