Web requests in Python

Recently tried making web requests in Python.

I used the urllib2 library for making 10 requests to the Azure Machine Learning web service.

Interestingly I found that using urllib2  was incurring a lot of latency.   I replaced urllib2 with the requests libarary and boom, the latency improved tremendously.

Tip:

  • it seems the requests library  by default uses KeepAlive.  As such, it was not re-initiating the connection each time for the multiple requests. urllib2 on the other hand was re-initiating the connection for each request.
  • Note:  the requests library is still making synchronous calls.


REST Calls in Python. JSON. Pandas.

I recently had to make REST calls in Python for sending data to Azure EventHub.

In this particular case I could not use the Python SDK to talk to EventHub. As I wrote down the code to make the raw REST calls, I came across several gems. Am listing them down below.

Tips:

  • Use the python ‘requests’ library.
    • i am yet to figure out how to make async calls. can i use this library for async as well or would I have to use something else
  • Sending JSON is way to go.
    • Don’t even try sending anything else
  • Pandas has great functionality to convert  Series/DataFrames to JSON.
    • the ‘to_json’ function has awesome functionality including orient by ‘records’ etc
  • Python has an awesome library called ‘json’ to deal with JSON data.
    • To deserialize ,use json.loads()
    • In particular,  to convert dict to JSON use  json.dumps().
    • Note: If you want to preserve the order, one would have to use ‘collections.OrderedDict’. Check this link

Check this out:


myj = '[{"reward":30,"actionname":"x","age":60,"gender":"M","weight":150,"Scored Labels":30.9928596354},{"reward":20,"actionname":"y","age":60,"gender":"M","weight":150,"Scored Labels":19.0217225957}]'

myj_l = json.loads(myj, object_pairs_hook=collections.OrderedDict)

myj_l
Out[177]:
[OrderedDict([(u'reward', 30), (u'actionname', u'x'), (u'age', 60), (u'gender', u'M'), (u'weight', 150), (u'Scored Labels', 30.9928596354)]),
 OrderedDict([(u'reward', 20), (u'actionname', u'y'), (u'age', 60), (u'gender', u'M'), (u'weight', 150), (u'Scored Labels', 19.0217225957)])]

for item in myj_l:
    print json.dumps(item)

{"reward": 30, "actionname": "x", "age": 60, "gender": "M", "weight": 150, "Scored Labels": 30.9928596354}
{"reward": 20, "actionname": "y", "age": 60, "gender": "M", "weight": 150, "Scored Labels": 19.0217225957}

References:

Code:

JSON DateTime

“DateTimes in JSON are hard.

The problem comes from the JSON spec itself: there is no literal syntax for dates in JSON. The spec has objects, arrays, strings, integers, and floats, but it defines no standard for what a date looks like.”

http://www.newtonsoft.com/json/help/html/datesinjson.htm

 

References:

 

Code:

DecisionController

 

Understanding the SynchronizationContext and Best Practices in Asynchronous Programming.

For doing asynchronous programming in C#, one needs to understand the concept of SynchronizationContext.

I came across this nice set of posts by Stephen Cleary, where he explains some of the fundamental design choices and best practices.

The primary reason for this line of exploration for me was because of a deadlock I hit when working on a web app using the async pattern. This problem is described here.

References:

 

Redis : Sorted Sets

I recently used Redis SortedSets quite heavily. I had a few pre-requisites:

  1. Should be extremely fast.
    • Think usage in the context of  a near real-time web API
  2. Should scale to billions of entities
  3. Should be able to give me the “rank” of an item.
    • Assume each item has a numeric value that is used to determine rank
  4. Should be able to give me the count of the total number of items in the set
  5. Should be able to give me the sum of the values in the set**
  6. Should be able to give me the cumulative sum of the values from the 0th item to kth item based on their ranks**

**I am yet to figure out how to do 5-6. Perhaps there is a way to do this in Redis as well.  Else, I am planning to do some form of Reservoir Sampling to get an approximation of the sums.

Observations:

  • Redis SortedSets support 1-4 out of the box!
  • In general I am amazed at the advanced internal data structures that are used in Redis.
  • Sorted Sets for example use SkipLists internally.
    • This serves as a great motivation to do a blog post on SkipLists actually.
    • Also, RangeQueries maybe. Sorted Sets support range operations which are quite handly.  Need to understand how those are supported internally.

Code:

 

References:

 

Redis : Usage Patterns

Redis Databases

  1. Use different Redis databases for different kinds of data. In Redis, databases are identified by an integer index, not by a database name. By default, a client is connected to database 0. With the SELECT command you can switch to a different database:
    • redis> select 3
      OK
  1. Each Redis database has its own keyspace. By using different databases for your ‘staging’ and ‘production’ data, for example, you don’t have to worry about key clashes between the two

 

References:

  1. http://www.rediscookbook.org/multiple_databases.html
  2. http://stackoverflow.com/questions/13386053/how-do-i-change-between-redis-database
  3. http://stackoverflow.com/questions/16221563/whats-the-point-of-multiple-redis-databases
  4. https://www.quora.com/What-are-5-mistakes-to-avoid-when-using-Redis
  5. http://lzone.de/cheat-sheet/Redis

 

StackExchange.Redis Basics

  1. The central object in StackExchange.Redis is the ConnectionMultiplexer class in the StackExchange.Redis namespace; this is the object that hides away the details of multiple servers. Because the ConnectionMultiplexer does a lot, it is designed to be shared and reused between callers. You should not create a ConnectionMultiplexer per operation. It is fully thread-safe and ready for this usage.
  2. Accessing a redis database is as simple as:
    • IDatabase db = redis.GetDatabase();

The object returned from GetDatabase is a cheap pass-thru object, and does not need to be stored

  1. RedisKey:
    • StackExchange.Redis represents keys by the RedisKey type. The good news, though, is that this has implicit conversions to and from both string and byte[], allowing both text and binary keys to be used without any complication
  1. RedisValue: 
    • Values can also need to represent typed primitive data – most commonly (in .NET terms) Int32Int64Double or Boolean. Because of this, RedisValue provides a lot more conversion support thanRedisKey
    • Note that while the conversions from primitives to RedisValue are implicit, many of the conversions from RedisValue to primitives are explicit: this is because it is very possible that these conversions will fail if the data does not have an appropriate value.
    • Note additionally that when treated numerically, redis treats a non-existent key as zero; for consistency with this, nil responses are treated as zero:

db.KeyDelete(“abc”);
int i = (int)db.StringGet(“abc”); // this is ZERO

  • If you need to detect the nil condition, then you can check for that:

db.KeyDelete(“abc”);
var value = db.StringGet(“abc”);
bool isNil = value.IsNull; // this is true

or perhaps more simply, just use the provided Nullable<T> support:

db.KeyDelete(“abc”);
var value = (int?)db.StringGet(“abc”); // behaves as you would expect

Documentation:

(colored ones just mean I have tried that documentation to some extent)

 

Code:

  1. FeaturizeBasic
  2. Redis Hashes Using StackExchange.Redis
  3. Redis Sets Using StackExchange.Redis

 

References:

  1. http://panuoksala.blogspot.com/2015/01/redis-hashes-and-net.html

Redis : Basics

Recently I have been playing around with Redis. In particular I have been trying how to integrate Redis with web services in Azure.

Redis is actually more like a data structures server, supporting very interesting data structures, and operations on them.

I am listing down some data structures, and corresponding when using these data structures.

  1. Strings
  2. Lists
  3. Sets
  4. Sorted Sets
    • Useful when fast access to the middle of a large collection of elements is important
  5. Hashes
    • Hashes are maps between string fields and string values, so they are the perfect data type to represent objects.
  6. Bit arrays
  7. Hyper Log Log

 

DECR, DECRBY, DEL, EXISTS, EXPIRE, GET, GETSET, HDEL, HEXISTS, HGET, HGETALL, HINCRBY, HKEYS, HLEN, HMGET, HMSET, HSET, HVALS, INCR, INCRBY, KEYS, LINDEX, LLEN, LPOP, LPUSH,LRANGE, LREM, LSET, LTRIM, MGET, MSET, MSETNX, MULTI, PEXPIRE, RENAME, RENAMENX, RPOP, RPOPLPUSH, RPUSH, SADD, SCARD, SDIFF, SDIFFSTORE, SET, SETEX, SETNX, SINTER, SINTERSTORE, SISMEMBER, SMEMBERS, SMOVE, SORT, SPOP, SRANDMEMBER, SREM, SUNION, SUNIONSTORE, TTL, TYPE, ZADD,ZCARD, ZCOUNT, ZINCRBY, ZRANGE, ZRANGEBYSCORE, ZRANK, ZREM, ZREMRANGEBYSCORE, ZREVRANGE, ZSCORE

 

Code on GitHub:

References: