Web requests in Python

Recently tried making web requests in Python.

I used the urllib2 library for making 10 requests to the Azure Machine Learning web service.

Interestingly I found that using urllib2  was incurring a lot of latency.   I replaced urllib2 with the requests libarary and boom, the latency improved tremendously.


  • it seems the requests library  by default uses KeepAlive.  As such, it was not re-initiating the connection each time for the multiple requests. urllib2 on the other hand was re-initiating the connection for each request.
  • Note:  the requests library is still making synchronous calls.

REST Calls in Python. JSON. Pandas.

I recently had to make REST calls in Python for sending data to Azure EventHub.

In this particular case I could not use the Python SDK to talk to EventHub. As I wrote down the code to make the raw REST calls, I came across several gems. Am listing them down below.


  • Use the python ‘requests’ library.
    • i am yet to figure out how to make async calls. can i use this library for async as well or would I have to use something else
  • Sending JSON is way to go.
    • Don’t even try sending anything else
  • Pandas has great functionality to convert  Series/DataFrames to JSON.
    • the ‘to_json’ function has awesome functionality including orient by ‘records’ etc
  • Python has an awesome library called ‘json’ to deal with JSON data.
    • To deserialize ,use json.loads()
    • In particular,  to convert dict to JSON use  json.dumps().
    • Note: If you want to preserve the order, one would have to use ‘collections.OrderedDict’. Check this link

Check this out:

myj = '[{"reward":30,"actionname":"x","age":60,"gender":"M","weight":150,"Scored Labels":30.9928596354},{"reward":20,"actionname":"y","age":60,"gender":"M","weight":150,"Scored Labels":19.0217225957}]'

myj_l = json.loads(myj, object_pairs_hook=collections.OrderedDict)

[OrderedDict([(u'reward', 30), (u'actionname', u'x'), (u'age', 60), (u'gender', u'M'), (u'weight', 150), (u'Scored Labels', 30.9928596354)]),
 OrderedDict([(u'reward', 20), (u'actionname', u'y'), (u'age', 60), (u'gender', u'M'), (u'weight', 150), (u'Scored Labels', 19.0217225957)])]

for item in myj_l:
    print json.dumps(item)

{"reward": 30, "actionname": "x", "age": 60, "gender": "M", "weight": 150, "Scored Labels": 30.9928596354}
{"reward": 20, "actionname": "y", "age": 60, "gender": "M", "weight": 150, "Scored Labels": 19.0217225957}



K-means Clustering

One of my friends recently asked me about the K-means algorithm.

  • how does it work ?
  • what are the typical applications i.e. where/how is it used in the industry ?

In the discussion that followed, we ended up playing around with several visualizations available that do an awesome job of explaining this technique.

we also hacked around with some code from joel grus’  book (data science from scratch) to develop more intuition on the K-means algorithm.




There are some very interesting insights which we got playing around with trying to use K-means to cluster an image containing different colors:



Handling Exceptions.

Some simple examples for exception handling:



x = 10
y = 20
    if x > y:
        print "abcd"
#        dividing by 0
        print 2 / 0

except Exception as e:
    print "hit an exception : {0} : {1}".format(e, e.message)

#hit an exception : integer division or modulo by zero : integer division or modulo by zero

In Python there are two keywords related to exceptions:

  • pass.
    • Think of this as same as continue.
  • raise
    • If we want to bubble up the exception, then ‘raise’ it . Else the exception is suppressed.

‘map’ in python

In some of my previous posts (Google while coding…, Functional tools in python.., Applying operations over dataframes)  I have noted the use of the ‘map keyword in pandas.

‘map is used in Python for the following scenarios :

  • as a functional operator in Python
    • Return a list of the results of applying the function to the items of
      the argument sequence(s)
  • as an element-wise function on a Series
  • pyspark



#Problem : that you've got two collections of values and you need to keep the largest (or smallest) from each. These could be metrics from two different systems, stock quotes from two different services, or just about anything. 

a = [1, 2, 3, 4, 5]
b = [2, 2, 9, 0, 9]

#Approach 1.
maxval = []
for i in range(len(a)):
    if a[i] >= b[i]:

#Approach 2.
print map(lambda pair: max(pair), zip(a,b))

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 3), columns=list('bde'), index=['Seattle', 'Utah', 'Ohio', 'Texas', 'Oregon'])

#1. Using DataFrame 'apply'
#applying a function on 1D arrays to each column or row.
f = lambda x: x.max() - x.min()

#2. Using DataFrame 'applymap'
#Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap
formatf = lambda x: '%.2f' % x

#3. Using Series 'map'
formatf = lambda x: '%.2f' % x
df['f'] = df['e'].map(formatf)

#Summing up, apply works on a row / column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series