Tuning Spark Jobs

I recently got into a discussion of how to tune spark jobs.

This led me to learnings related to dynamic allocation and stuff.

Some interesting links:


kafka-spark integration

Am trying to get to a point whereby i will have  kafka + spark streaming running locally on my machine.

there are several things to figure out along the way:

  • kafka
  • scala
  • spark
  • spark-streaming



EdX course ‘Introduction to Apache Spark’ resources

I am spending some time learning spark. As I make progress I think it would be a good idea to keep track of some resources I have found useful.


Code Repo: