k Nearest Neighbors

I continue to chip away at Data Science from Scratch. This time I tried out Chapter 12: K Nearest Neighbours.

Learnt several few things from this hack:

[1] How to do XML parsing in Python. (blogged about it as well)

[2] Visualization.

[3] Python Coding. Joel’s code is amazing.

 

Some cool visualizations are as follows:

Here’s how the data looks like plotted onto the US map:

knnStatesLanguages.JPG

 

Check out how the variation when the value of K varies from K=1 to K=5.

K=1. This is an example of overfitting.

k_1.JPG

K=5

k_5.JPG

 

Code:

 

 

 

 

 

 

 

Python Visualization

matplotlib has become my favourite visualization library in python.

I was recently playing around with some of the simple visualizations possible using matplotlib.

Also, two very-useful beginner level tutorials for plotting using matplotlib are as follows:

Code:

References:

XML Parsing in Python

There are tons of libraries out there for processing XML in Python.

[1] minidom

[2] ElementTree

One of the painpoints I found while using ElementTree is that the XML is written out in a single line instead of a nice pretty format.

Apparently the best way to prettify it is using the minidom library.

Code:

  • Checkout the ‘prettify’ method in this code, and
  • How its being used in the editXML method. (relevant files : appt, updated)