Scaling real time processing jobs in Azure

I recently was faced with an issue about how to scale real time processing jobs in Azure.

I finally managed to do it by using the concept of partitions.  Using partitions in EventHub along with Azure Stream Analytics got the job done for me.

References:

Advertisements

Windowing Operations in Azure Stream Analytics

Windowing is a very common operation in stream analytics.

Beneath the surface, there is a whole bunch of complex data structuring that’s going on to support the windowing operations. I would love to dig deeper into these someday.

Example:

Here is an example of a query I wrote recently using windowing operators in azure stream analytics. It shows 3 interesting things :
1. Windowing
2. CTEs
3. Aggregation over string columns (using TopOne)

WITH ContextReward AS (
    SELECT 
        eventid,
        TopOne() OVER (ORDER BY [EventEnqueuedUtcTime] ASC) CR,
        MAX (reward) AS reward
    FROM Input
    GROUP BY eventid, HoppingWindow(Duration(hour, 2), Hop(hour, 1))
)

SELECT 
    reward,
    eventid, 
    CR.actionname AS actionname,
    CR.age AS age,
    CR.gender AS gender,
    CR.weight AS weight,
    CR.actionprobability
INTO OutputWindow
FROM ContextReward

SELECT * INTO Output FROM Input 
SELECT * INTO OutputCSV FROM Input

 

References:

Making REST calls to send data to an Azure EventHub

I recently encountered a situation where I had to use pure REST Calls to send data to an Azure Event Hub.

Tips:

  • If you are used to using libraries (C#, Python) you will find that the libraries are doing a lot behind the scenes. Its not trivial to go from using the library to making pure REST calls
  • The first approach – using Fiddler to capture the traffic and re-purpose those calls – failed.
    • I am not sure why the calls fail to show up on fiddler. I tried out a few things like decrypt HTTPS and stuff. But I wasn’t able to get the sending traffic to show up on Fiddler
  • The references below give a good of how I made some progress.

REST Call to send data:

I finally got it to work with something like this:

POST https://simplexagpmeh.servicebus.windows.net/simplexagpmeh/messages?timeout=60&api-version=2014-01 HTTP/1.1
User-Agent: Fiddler
Authorization: SharedAccessSignature sr=http%3a%2f%2fsimplexagpmeh.servicebus.windows.net%2f&sig=RxvSkhotfGEwERdiaA8oLr7X9u5XLeDI8TCK5DhDPP8%3d&se=1476214239&skn=RootManageSharedAccessKey
ContentType: application/atom+xml;type=entry;charset=utf-8
Host: simplexagpmeh.servicebus.windows.net
Content-Length: 153
Expect: 100-continue

{ "DeviceId" : "ArduinoYun",
  "SensorData" : [ { "SensorId" : "awk",
        "SensorType" : "temperature",
        "SensorValue" : 24.5
      } ]
}

References:

Code:

 

Redis : Sorted Sets

I recently used Redis SortedSets quite heavily. I had a few pre-requisites:

  1. Should be extremely fast.
    • Think usage in the context of  a near real-time web API
  2. Should scale to billions of entities
  3. Should be able to give me the “rank” of an item.
    • Assume each item has a numeric value that is used to determine rank
  4. Should be able to give me the count of the total number of items in the set
  5. Should be able to give me the sum of the values in the set**
  6. Should be able to give me the cumulative sum of the values from the 0th item to kth item based on their ranks**

**I am yet to figure out how to do 5-6. Perhaps there is a way to do this in Redis as well.  Else, I am planning to do some form of Reservoir Sampling to get an approximation of the sums.

Observations:

  • Redis SortedSets support 1-4 out of the box!
  • In general I am amazed at the advanced internal data structures that are used in Redis.
  • Sorted Sets for example use SkipLists internally.
    • This serves as a great motivation to do a blog post on SkipLists actually.
    • Also, RangeQueries maybe. Sorted Sets support range operations which are quite handly.  Need to understand how those are supported internally.

Code:

 

References:

 

Redis : Usage Patterns

Redis Databases

  1. Use different Redis databases for different kinds of data. In Redis, databases are identified by an integer index, not by a database name. By default, a client is connected to database 0. With the SELECT command you can switch to a different database:
    • redis> select 3
      OK
  1. Each Redis database has its own keyspace. By using different databases for your ‘staging’ and ‘production’ data, for example, you don’t have to worry about key clashes between the two

 

References:

  1. http://www.rediscookbook.org/multiple_databases.html
  2. http://stackoverflow.com/questions/13386053/how-do-i-change-between-redis-database
  3. http://stackoverflow.com/questions/16221563/whats-the-point-of-multiple-redis-databases
  4. https://www.quora.com/What-are-5-mistakes-to-avoid-when-using-Redis
  5. http://lzone.de/cheat-sheet/Redis

 

StackExchange.Redis Basics

  1. The central object in StackExchange.Redis is the ConnectionMultiplexer class in the StackExchange.Redis namespace; this is the object that hides away the details of multiple servers. Because the ConnectionMultiplexer does a lot, it is designed to be shared and reused between callers. You should not create a ConnectionMultiplexer per operation. It is fully thread-safe and ready for this usage.
  2. Accessing a redis database is as simple as:
    • IDatabase db = redis.GetDatabase();

The object returned from GetDatabase is a cheap pass-thru object, and does not need to be stored

  1. RedisKey:
    • StackExchange.Redis represents keys by the RedisKey type. The good news, though, is that this has implicit conversions to and from both string and byte[], allowing both text and binary keys to be used without any complication
  1. RedisValue: 
    • Values can also need to represent typed primitive data – most commonly (in .NET terms) Int32Int64Double or Boolean. Because of this, RedisValue provides a lot more conversion support thanRedisKey
    • Note that while the conversions from primitives to RedisValue are implicit, many of the conversions from RedisValue to primitives are explicit: this is because it is very possible that these conversions will fail if the data does not have an appropriate value.
    • Note additionally that when treated numerically, redis treats a non-existent key as zero; for consistency with this, nil responses are treated as zero:

db.KeyDelete(“abc”);
int i = (int)db.StringGet(“abc”); // this is ZERO

  • If you need to detect the nil condition, then you can check for that:

db.KeyDelete(“abc”);
var value = db.StringGet(“abc”);
bool isNil = value.IsNull; // this is true

or perhaps more simply, just use the provided Nullable<T> support:

db.KeyDelete(“abc”);
var value = (int?)db.StringGet(“abc”); // behaves as you would expect

Documentation:

(colored ones just mean I have tried that documentation to some extent)

 

Code:

  1. FeaturizeBasic
  2. Redis Hashes Using StackExchange.Redis
  3. Redis Sets Using StackExchange.Redis

 

References:

  1. http://panuoksala.blogspot.com/2015/01/redis-hashes-and-net.html

Redis : Basics

Recently I have been playing around with Redis. In particular I have been trying how to integrate Redis with web services in Azure.

Redis is actually more like a data structures server, supporting very interesting data structures, and operations on them.

I am listing down some data structures, and corresponding when using these data structures.

  1. Strings
  2. Lists
  3. Sets
  4. Sorted Sets
    • Useful when fast access to the middle of a large collection of elements is important
  5. Hashes
    • Hashes are maps between string fields and string values, so they are the perfect data type to represent objects.
  6. Bit arrays
  7. Hyper Log Log

 

DECR, DECRBY, DEL, EXISTS, EXPIRE, GET, GETSET, HDEL, HEXISTS, HGET, HGETALL, HINCRBY, HKEYS, HLEN, HMGET, HMSET, HSET, HVALS, INCR, INCRBY, KEYS, LINDEX, LLEN, LPOP, LPUSH,LRANGE, LREM, LSET, LTRIM, MGET, MSET, MSETNX, MULTI, PEXPIRE, RENAME, RENAMENX, RPOP, RPOPLPUSH, RPUSH, SADD, SCARD, SDIFF, SDIFFSTORE, SET, SETEX, SETNX, SINTER, SINTERSTORE, SISMEMBER, SMEMBERS, SMOVE, SORT, SPOP, SRANDMEMBER, SREM, SUNION, SUNIONSTORE, TTL, TYPE, ZADD,ZCARD, ZCOUNT, ZINCRBY, ZRANGE, ZRANGEBYSCORE, ZRANK, ZREM, ZREMRANGEBYSCORE, ZREVRANGE, ZSCORE

 

Code on GitHub:

References:

Web Services : Data flow between the UI view and the backend APIs

These two articles helped me a lot in understanding how to handle data passing between the UI view and the backend APIs.

References:

  1. https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Forms/Sending_and_retrieving_form_data
  2. http://www.asp.net/web-api/overview/advanced/sending-html-form-data-part-1
  3. http://carehart.org/blog/client/index.cfm/2007/1/2/form_self_post