Data Science From Scratch

Here are the chapters from the book Data Science from Scratch by Joel Grus.

Blue Indicates I have played around with these chapters.

  • Chapter 1: Introduction (What is data science?)
  • Chapter 2: A Crash Course in Python (syntax, data structures, control flow, and other features)
  • Chapter 3: Visualizing Data (bar, line and scatter plots with matplotlib)
  • Chapter 4: Linear Algebra (vectors and matricies)
  • Chapter 5: Statistics (central tendency and correlations)
  • Chapter 6: Probability (Bayes’ Theorem, Random Variables, Normality)
  • Chapter 7: Hypothesis and Inference (confidence intervals, P values, Bayesian inference)
  • Chapter 8: Gradient Descent (gradients, steps, stochastic variation)
  • Chapter 9: Getting Data (scraping HTML, JSON APIs)
  • Chapter 10: Working with Data (basic viz, data transforms)
  • Chapter 11: Machine Learning (fitting, bias-variance, feature selection)
  • Chapter 12: k-Nearest Neighbors (also curse of dimensionality)
  • Chapter 13: Naive Bayes
  • Chapter 14: Simple Linear Regression (also gradient descent)
  • Chapter 15: Multiple Regression (also bootstrap, regularization)
  • Chapter 16: Logistic Regression (also SVM)
  • Chapter 17: Decision Trees (also random forest)
  • Chapter 18: Neural Networks (perceptron and back-prop)
  • Chapter 19: Clustering (k-Means)
  • Chapter 20: Natural Language Processing (n-gram, grammars, Gibbs sampling)
  • Chapter 21: Network Analysis (Centrality and PageRank)
  • Chapter 22: Recommender Systems (user- and item-based)
  • Chapter 23: Databases and SQL (basic usage)
  • Chapter 24: MapReduce (various worked examples)
  • Chapter 25: Go Forth and Do Data Science (libs you should use)



Stack & Queues

In CLRS,  the authors have a chapter called “Elementary Data Structures”. In there they cover 3 types of data structures in particular:

  • Stacks & Queues
  • Linked Lists
  • Trees (representations)

There is much to learn from these “elementary” data structures.

Here is a problem I came across recently .

Problem: Implement a queue using as many stacks as needed.


The question is deceptively simple. There are two things we need to take care of first before approaching this problem:

  1. What Stack APIs are exposed / available to us ?
  2. How to detect empty stack ?

Some insights I had while implementing this solution are as follows:

  • One needs to be careful in implementation details.
  • Terminating conditions need to be carefully considered for stack / queue class of problems.










Adding two numbers using linked list

An interesting problem I came across recently:


You are given two linked lists representing two non-negative numbers. The digits are stored in reverse order and each of their nodes contain a single digit. Add the two numbers and return it as a linked list.

Input: (2 -> 4 -> 3) + (5 -> 6 -> 4)
Output: 7 -> 0 -> 8






Tail Command (Linux)


  • -f :
  • -n, –lines=Koutput the last K lines, instead of the last 10; or use -n +K to output lines starting with the Kth
    • Note: the ‘+’ has a significance.
      • If there is no ‘+’,  then tail counts K lines from the bottom of the file
      • If there is a ‘+’,  then tail counts K lines from the top of the file
  • -q, –quiet, –silentnever output headers giving file names