Getting Started with Spark on Windows 10 (Part 2)

After the initial start detailed in Part 1  of Getting Started with Spark on Windows 10, I started running into some issues.

To remove permission issues from the equation, I unzipped the spark package into the ‘D:\’ drive this time. this allowed me analyze some issues thoroughly. Here are some observations :

Observations:

  • Make sure ‘winutils’ is properly set up. Otherwise there is an error that gets thrown when starting pyspark / spark-shell
    • the error says ‘could not locate winutils’ in the hadoop binaries.
  • I noticed a few files/folders getting auto generated :
    1. File named ‘derby’
    2. ‘metastore_db’ folder
    3. ‘tmp’ folder
  • The ‘derby’ file and the ‘metastore_db’ folder seem to be created in any location where the spark app is located.
  • The ‘tmp’ folder has to be given full permissions.
    • Note:  I noticed the ‘tmp’ folder getting created in my ‘D:\’ drive. Earlier I had this ‘tmp’ folder in my ‘C:\’ drive as well.  I need to follow up more on this.
      • do cross check if there are multiple ‘tmp’ folders and ensure the permissions are set up properly
    • If the folder doesnt have 777 permission, then you would hit the following error when running either ‘pyspark’ or ‘spark-shell’ :
      • java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
  • After setting the permissions properly in the ‘tmp’ folder, i hit another issue
    • java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:D://spark-warehouse
    • This issue is Windows specific. Its discussed in the following two threads:

 

So that was it. I was now able to get a set up going properly now. Phew!

Environment Variable:

capture

Pyspark:

  • D:\>pyspark –conf spark.sql.warehouse.dir=file:///D:/tmp

capture

 

Spark-Shell:

  • D:\>spark-shell –conf spark.sql.warehouse.dir=file:///D:/tmp

capture

References:

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s