Matplotlib. Histograms and line plots

Some cool visualization :

Histograms:

histaml

Line Plots

lineaml

Code:

 

 

 

Advertisements

Azure Storage using Python

Lately I have had to use azure storage. Interestingly I found there are Python SDKs to work with Azure storage.

Here’s a sample step-by step procedure :

  1. Install the azure package.
pip install azure

You are using pip version 7.0.3, however version 8.0.2 is available.
You should consider upgrading via the ‘python -m pip install –upgrade pip’ command.
Collecting azure
Downloading azure-1.0.3.zip
Collecting azure-mgmt==0.20.2 (from azure)
Downloading azure-mgmt-0.20.2.zip
Collecting azure-servicebus==0.20.1 (from azure)
Downloading azure_servicebus-0.20.1-py2.py3-none-any.whl
Collecting azure-storage==0.20.3 (from azure)
Downloading azure_storage-0.20.3-py2-none-any.whl (86kB)
100% |################################| 90kB 819kB/s
Collecting azure-servicemanagement-legacy==0.20.2 (from azure)
Downloading azure_servicemanagement_legacy-0.20.2-py2.py3-none-any.whl (78kB)
100% |################################| 81kB 975kB/s
Collecting azure-mgmt-compute==0.20.1 (from azure-mgmt==0.20.2->azure)
Downloading azure_mgmt_compute-0.20.1-py2.py3-none-any.whl (72kB)
100% |################################| 73kB 890kB/s
Collecting azure-mgmt-network==0.20.1 (from azure-mgmt==0.20.2->azure)
Downloading azure_mgmt_network-0.20.1-py2.py3-none-any.whl (77kB)
100% |################################| 77kB 835kB/s
Collecting azure-mgmt-resource==0.20.1 (from azure-mgmt==0.20.2->azure)
Downloading azure_mgmt_resource-0.20.1-py2.py3-none-any.whl
Collecting azure-mgmt-storage==0.20.0 (from azure-mgmt==0.20.2->azure)
Downloading azure_mgmt_storage-0.20.0-py2.py3-none-any.whl
Collecting azure-common (from azure-servicebus==0.20.1->azure)
Using cached azure_common-1.0.0-py2.py3-none-any.whl
Requirement already satisfied (use –upgrade to upgrade): requests in c:\users\agoswami\appdata\local\continuum\anaconda\lib\site-packages (from azure-servicebus==0.20.1->azure)
Requirement already satisfied (use –upgrade to upgrade): futures in c:\users\agoswami\appdata\local\continuum\anaconda\lib\site-packages (from azure-storage==0.20.3->azure)
Requirement already satisfied (use –upgrade to upgrade): python-dateutil in c:\users\agoswami\appdata\local\continuum\anaconda\lib\site-packages (from azure-storage==0.20.3->azure
)
Collecting azure-nspkg (from azure-storage==0.20.3->azure)
Using cached azure_nspkg-1.0.0-py2.py3-none-any.whl
Collecting azure-mgmt-common (from azure-mgmt-compute==0.20.1->azure-mgmt==0.20.2->azure)
Downloading azure_mgmt_common-0.20.0-py2.py3-none-any.whl
Requirement already satisfied (use –upgrade to upgrade): six>=1.5 in c:\users\agoswami\appdata\local\continuum\anaconda\lib\site-packages (from python-dateutil->azure-storage==0.2
0.3->azure)
Collecting azure-mgmt-nspkg (from azure-mgmt-common->azure-mgmt-compute==0.20.1->azure-mgmt==0.20.2->azure)
Downloading azure_mgmt_nspkg-1.0.0-py2.py3-none-any.whl
Installing collected packages: azure-nspkg, azure-common, azure-mgmt-nspkg, azure-mgmt-common, azure-mgmt-compute, azure-mgmt-network, azure-mgmt-resource, azure-mgmt-storage, azur
e-mgmt, azure-servicebus, azure-storage, azure-servicemanagement-legacy, azure
Running setup.py install for azure-mgmt
Running setup.py install for azure
Successfully installed azure-1.0.3 azure-common-1.0.0 azure-mgmt-0.20.2 azure-mgmt-common-0.20.0 azure-mgmt-compute-0.20.1 azure-mgmt-network-0.20.1 azure-mgmt-nspkg-1.0.0 azure-mg
mt-resource-0.20.1 azure-mgmt-storage-0.20.0 azure-nspkg-1.0.0 azure-servicebus-0.20.1 azure-servicemanagement-legacy-0.20.2 azure-storage-0.20.3

 

So I start seeing azure packages that I can use.

 

$ pwd

/cygdrive/c/Users/agoswami/AppData/Local/Continuum/Anaconda

$ find . -iname ‘*azure*’ -type d
./Lib/site-packages/azure
./Lib/site-packages/azure-1.0.3-py2.7.egg-info
./Lib/site-packages/azure_common-1.0.0.dist-info
./Lib/site-packages/azure_mgmt-0.20.2-py2.7.egg-info
./Lib/site-packages/azure_mgmt_common-0.20.0.dist-info
./Lib/site-packages/azure_mgmt_compute-0.20.1.dist-info
./Lib/site-packages/azure_mgmt_network-0.20.1.dist-info
./Lib/site-packages/azure_mgmt_nspkg-1.0.0.dist-info
./Lib/site-packages/azure_mgmt_resource-0.20.1.dist-info
./Lib/site-packages/azure_mgmt_storage-0.20.0.dist-info
./Lib/site-packages/azure_nspkg-1.0.0.dist-info
./Lib/site-packages/azure_servicebus-0.20.1.dist-info
./Lib/site-packages/azure_servicemanagement_legacy-0.20.2.dist-info
./Lib/site-packages/azure_storage-0.20.3.dist-info
./myAzure

 

2. Import the libraries

from azure.storage.table import TableService

 

3. Use the apis provided.

 

Code:

 

References:

[1]  How to use Table storage from Python : https://azure.microsoft.com/en-us/documentation/articles/storage-python-how-to-use-table-storage/

[2] Install the Python SDK for Azure : https://azure.microsoft.com/en-us/documentation/articles/python-how-to-install/

 

pip / conda / import

Revisiting a few things that I have sort of glossed over in my Python journey.

  1. conda / pip
    • If the conda install somepackage fails, you can try pip install somepackage instead, which uses the PyPI instead of Anaconda. Many scientific Anaconda packages are easier to install than the corresponding PyPI packages because they are pre-compiled for your platform. However, many packages are available on PyPI but not on Anaconda.
    • conda is a package management tool for installing scientific and analytical computing packages, which may be written in Python or other programming languages. conda also creates a virtual environment, like python-virtualenv does. conda is the package manager of Anaconda. Anaconda is a free Python distribution provided by Continuum Analytics, which includes over 195 of the most popular Python packages for science, math, engineering and data analysis.pip is a general purpose Python package installer. In addition, python-pip and python3-pip are in the default Ubuntu repositories. In most cases you would choose pip instead of conda if you want to install a Python package management application. To create a Python virtual environment without installing conda, you can install python-virtualenv from the Ubuntu Software Center.pip and conda use different packaging formats so they do not operate interchangeably, but you can use both tools side by side.
  2. import
    • imports work by searching the directories listed in sys.path.
    • Here’s what it looks like on my system:
import sys
print '\n'.join(sys.path)</pre>

C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\site-packages\okcupyd-1.0.0a3-py2.7.egg
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\site-packages\invoke-0.11.1-py2.7.egg
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\site-packages\coloredlogs-5.0-py2.7.egg
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\site-packages\wrapt-1.10.6-py2.7.egg
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\site-packages\simplejson-3.8.1-py2.7.egg
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\site-packages\humanfriendly-1.42-py2.7.egg
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\python27.zip
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\DLLs
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\plat-win
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\lib-tk
C:\Users\agoswami\AppData\Local\Continuum\Anaconda
c:\users\agoswami\appdata\local\continuum\anaconda\lib\site-packages\sphinx-1.3.1-py2.7.egg
c:\users\agoswami\appdata\local\continuum\anaconda\lib\site-packages\setuptools-17.1.1-py2.7.egg
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\site-packages
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\site-packages\cryptography-0.9.1-py2.7-win-amd64.egg
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\site-packages\win32
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\site-packages\win32\lib
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\site-packages\Pythonwin
C:\Users\agoswami\AppData\Local\Continuum\Anaconda\lib\site-packages\IPython\extensions
C:\Users\agoswami\.ipython

So Python will find any packages that have been installed to those locations.

3. Pyspark.

When I was doing spark development, I wanted to use something like Spyder. This led me to dig into how topics like PYTHONPATH and stuff.

4.  Install / Update packages for specific versions of python.

 

Verify for Python2.7:


pd.__file__
Out[3]: '/home/abgoswam/.local/lib/python2.7/site-packages/pandas/__init__.pyc'

pd.__version__
Out[4]: u'0.19.1'

References:

  1. https://leifengtechblog.wordpress.com/category/programming-language/python/
  2. http://askubuntu.com/questions/574424/what-is-the-difference-between-pip-and-conda
  3. https://leemendelowitz.github.io/blog/how-does-python-find-packages.html
    • This is a useful one to understand packages
  4. https://python4astronomers.github.io/installation/packages.html
  5. https://azure.microsoft.com/en-us/documentation/articles/machine-learning-execute-python-scripts/
    • This is in fact another simple example of how import works  (look at the section ‘Importing existing Python script modules’)

Reading files in Python.

Lets say we have a file coffee.csv:

 

$ cat -n ../_resources/coffee.csv
1 “Coffee”,”Water”,”Milk”,”Icecream”
2 “Espresso”,”No”,”No”,”No”
3 “Long Black”,”Yes”,”No”,”No”
4 “Flat White”,”No”,”Yes”,”No”
5 “Cappuccino”,”No”,“Yes,Frothy”,”No”
6 “Affogato”,”No”,”No”,”Yes”
7
8
9 abcd
$ wc -l ../_resources/coffee.csv
8 ../_resources/coffee.csv

Note: how the cat -n is showing 9 lines, but wc -l is showing 8. This is because in this file the last line does not end in a newline.  wc -l is only counting the number of newlines

Note: how the file has two empty lines (not really empty its the special ‘\n’ character). Also one of the columns has a comma. So we need to be careful.

So lets explore ways to read in this file:

[1]. f.read() :

The entire contents of the file will be read

[2] f.readline() : 

f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by ‘\n’, a string containing only a single newline.

[3] f.readlines()

f.readlines() reads everything in the text file and has them in a list of lines.

[4] Looping over the file object.

[5]  Using csv library.

This is especially important since CSV files can have unexpected commas (e.g. “Yes,Frothy” above) .

In this case we first create a csv reader using the file handle. Then we iterate over the csv reader. In each iteration we get a list.

#CSV Code

import csv
with open(filename, ‘rb’) as f:
header = f.readline().strip()
print header

line1 = f.next().strip()
print “line1 : {0}”.format(line1)

# creating the csv reader
csvreader = csv.reader(f)

line2 = csvreader.next()
print “line2 : {0}”.format(line2)

# iterate over the csvreader
for line in csvreader:
print line

Output: 

“Coffee”,”Water”,”Milk”,”Icecream”
line1 : “Espresso”,”No”,”No”,”No”
line2 : [‘Long Black’, ‘Yes’, ‘No’, ‘No’]
[‘Flat White’, ‘No’, ‘Yes’, ‘No’]
[‘Cappuccino’, ‘No’, ‘Yes,Frothy’, ‘No’]
[‘Affogato’, ‘No’, ‘No’, ‘Yes’]
[]
[]
[‘abcd’]

 

[6] Pandas.

(See how pandas throws out the empty lines)

import pandas as pd
df = pd.read_csv(filename)

df
Out[114]:
Coffee Water Milk Icecream
0 Espresso No No No
1 Long Black Yes No No
2 Flat White No Yes No
3 Cappuccino No Yes,Frothy No
4 Affogato No No Yes
5 abcd NaN NaN NaN

 

Code:

 

 

 

 

Web Services : WebClient vs HttpClient vs HttpWebRequest

Using C# I have found multiple ways to retrieve HTML from a given URL.

  • Using  HttpClient / GetByteArrayAsync / WebUtility.HtmlDecode
  • Using HttpClient / GetAsync /ReadAsStringAsync
  • Using   (HttpWebRequest)WebRequest.Create / (HttpWebResponse)request.GetResponse

 

On top of it,  there is also the HtmlAgilityPack library that lets you load up a web page just by  using the ‘.Load(url)’ method. Check it out here.

 

For beginners it may be a bit perplexing to see these different libraries. I realized this is because of how new libraries got added.

References:

 

Code:

 

 

Sed Usage.

Two pretty useful sed commands:

[1] Show a particular line only

[2] Dealing with range of lines:

[3] Prepend Text to a File at the Command Line

  • sed -i ‘1i id,gender,age’ mldataset.csv  (Prepend Text to a File at the Command Line)

[4] Remove blank lines from file.

  • sed ‘/^$/d’ input.txt > output.txt
  • grep -v ‘^$’ input.txt > output.txt

(Both grep and sed use special pattern ^$ that matchs the blank lines. Grep -v option means print all lines except blank line.)

References: