The references below helped me get started with spark on windows. I am listing down a few additional tips based on my experience:
- Added the following as system environment variables for Sbt, Spark, Scala, Hadoop and Java.
- Added the following to the system PATH environment variable:
- Note the java path is automatically picked up. I believe that’s because there is a java class path already present inside PATH [C:\ProgramData\Oracle\Java\javapath]
- spark-shell on Cygwin given an error on lauch itself.
- Error looks something [: too many arguments
- I think Spark on Cygwin has not been fully tried out yet. Read this.
- After getting spark-shell to launch on the Cmd window, there is another weird stack trace that I hit
- Try doing this : scala> spark.read.csv(“people.csv”).show
- The error says something about ‘spark-warehouse’.
- These two links helped to fix the issue:
- Doing something like this helped solve the issue:
- spark-shell –conf spark.sql.warehouse.dir=file:///C:/tmp
- There is a weird stack dump that happens when exiting the spark-shell. (i.e on hitting :q within he spark shell)
- It seems it is non-deterministic. i am not aware of the root cause of this issue..
- This link gives a good overview of the SYSTEM/PATH variables that need to be set