After the initial start detailed in Part 1 of Getting Started with Spark on Windows 10, I started running into some issues.
To remove permission issues from the equation, I unzipped the spark package into the ‘D:\’ drive this time. this allowed me analyze some issues thoroughly. Here are some observations :
Observations:
- Make sure ‘winutils’ is properly set up. Otherwise there is an error that gets thrown when starting pyspark / spark-shell
- the error says ‘could not locate winutils’ in the hadoop binaries.
- I noticed a few files/folders getting auto generated :
- File named ‘derby’
- ‘metastore_db’ folder
- ‘tmp’ folder
- The ‘derby’ file and the ‘metastore_db’ folder seem to be created in any location where the spark app is located.
- The ‘tmp’ folder has to be given full permissions.
- Note: I noticed the ‘tmp’ folder getting created in my ‘D:\’ drive. Earlier I had this ‘tmp’ folder in my ‘C:\’ drive as well. I need to follow up more on this.
- do cross check if there are multiple ‘tmp’ folders and ensure the permissions are set up properly
- If the folder doesnt have 777 permission, then you would hit the following error when running either ‘pyspark’ or ‘spark-shell’ :
- java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
- After setting the permissions properly in the ‘tmp’ folder, i hit another issue
- java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:D://spark-warehouse
- This issue is Windows specific. Its discussed in the following two threads:
So that was it. I was now able to get a set up going properly now. Phew!
Environment Variable:

Pyspark:
- D:\>pyspark –conf spark.sql.warehouse.dir=file:///D:/tmp

Spark-Shell:
- D:\>spark-shell –conf spark.sql.warehouse.dir=file:///D:/tmp

References: