I am spending some time learning spark. As I make progress I think it would be a good idea to keep track of some resources I have found useful.
- Lecture 1 slides (PDF)
- Lecture 2 slides (PDF)
- has very nice references on getting started / research papers etc.
- Lecture 3 slides (PDF)
- Deep Dive into Spark SQL’s Catalyst Optimizer
- Two notebooks from DataBricks:
- A Gentle Introduction to Apache Spark on Databricks
- Apache Spark on Databricks for Data Engineers
- Apache Spark on Databricks for Data Scientists
- the interesting thing to note about these 2 notebooks above is how databricks has informally tried to distinguish between data engineers and data scientists.
- Lab1 tutorial
- Lab2 tutorial
- the labs were super useful