# Data Science From Scratch

Here are the chapters from the book Data Science from Scratch by Joel Grus.

Blue Indicates I have played around with these chapters.

• Chapter 1: Introduction (What is data science?)
• Chapter 2: A Crash Course in Python (syntax, data structures, control flow, and other features)
• Chapter 3: Visualizing Data (bar, line and scatter plots with matplotlib)
• Chapter 4: Linear Algebra (vectors and matricies)
• Chapter 5: Statistics (central tendency and correlations)
• Chapter 6: Probability (Bayes’ Theorem, Random Variables, Normality)
• Chapter 7: Hypothesis and Inference (confidence intervals, P values, Bayesian inference)
• Chapter 9: Getting Data (scraping HTML, JSON APIs)
• Chapter 10: Working with Data (basic viz, data transforms)
• Chapter 11: Machine Learning (fitting, bias-variance, feature selection)
• Chapter 12: k-Nearest Neighbors (also curse of dimensionality)
• Chapter 13: Naive Bayes
• Chapter 14: Simple Linear Regression (also gradient descent)
• Chapter 15: Multiple Regression (also bootstrap, regularization)
• Chapter 16: Logistic Regression (also SVM)
• Chapter 17: Decision Trees (also random forest)
• Chapter 18: Neural Networks (perceptron and back-prop)
• Chapter 19: Clustering (k-Means)
• Chapter 20: Natural Language Processing (n-gram, grammars, Gibbs sampling)
• Chapter 21: Network Analysis (Centrality and PageRank)
• Chapter 22: Recommender Systems (user- and item-based)
• Chapter 23: Databases and SQL (basic usage)
• Chapter 24: MapReduce (various worked examples)
• Chapter 25: Go Forth and Do Data Science (libs you should use)

# Stack & Queues

In CLRS,  the authors have a chapter called “Elementary Data Structures”. In there they cover 3 types of data structures in particular:

• Stacks & Queues
• Trees (representations)

There is much to learn from these “elementary” data structures.

Here is a problem I came across recently .

Problem: Implement a queue using as many stacks as needed.

The question is deceptively simple. There are two things we need to take care of first before approaching this problem:

1. What Stack APIs are exposed / available to us ?
2. How to detect empty stack ?

Some insights I had while implementing this solution are as follows:

• One needs to be careful in implementation details.
• Terminating conditions need to be carefully considered for stack / queue class of problems.

Code:

An interesting problem I came across recently:

Pb:

You are given two linked lists representing two non-negative numbers. The digits are stored in reverse order and each of their nodes contain a single digit. Add the two numbers and return it as a linked list.

Input: (2 -> 4 -> 3) + (5 -> 6 -> 4)
Output: 7 -> 0 -> 8

Reference:

Code:

[1]  -l

# Tail Command (Linux)

## Options:

• -f :
• -n, –lines=Koutput the last K lines, instead of the last 10; or use -n +K to output lines starting with the Kth
• Note: the ‘+’ has a significance.
• If there is no ‘+’,  then tail counts K lines from the bottom of the file
• If there is a ‘+’,  then tail counts K lines from the top of the file
• -q, –quiet, –silentnever output headers giving file names

Tips:

# Less Command (Linux)

### Useful options:

• -M: Shows more detailed prompt, including file position.
• -N: Shows line numbers (useful for source code viewing).