Toe Dipping Into Apache Mahout

I went to this talk titled ‘Apache Mahout – What’s next?’ by Trevor Grant

Few things struck me after attending the talk:

  • apache mahout seems to be a pretty interesting framework for distributed matrix operations.
    • the ppt can be found here
  • trevor’s blog post has great pointers for getting started with a lot of the technologies on the fringes like  flink, mahot etc

kafka-spark integration

Am trying to get to a point whereby i will have  kafka + spark streaming running locally on my machine.

there are several things to figure out along the way:

  • kafka
  • scala
  • spark
  • spark-streaming



Distributed Locks

One reason why Redis has custom locking, instead of using operating system–level locks, language-level locks, and so forth, is a matter of scope. Clients want to have exclusive access to data stored on Redis, so clients need to have access to a lock defined in a scope that all clients can see—Redis.

Redis does have a basic sort of lock already available as part of the command set (SETNX), which we use, but it’s not full-featured and doesn’t offer advanced functionality that users would expect of a distributed lock.

In fact there are two patterns which have emerged for locking in Redis.

  1. Locking with SETNX
    • it’s not full-featured and doesn’t offer advanced functionality
  2. Redlock
    • the distributed locking algorithm.



Redis Internals



Data Types:

On Persistence:

Pub / Sub:


Redis Cluster Design and Specification



Scaling real time processing jobs in Azure

I recently was faced with an issue about how to scale real time processing jobs in Azure.

I finally managed to do it by using the concept of partitions.  Using partitions in EventHub along with Azure Stream Analytics got the job done for me.