Toe Dipping Into Apache Mahout

I went to this talk titled ‘Apache Mahout – What’s next?’ by Trevor Grant

Few things struck me after attending the talk:

  • apache mahout seems to be a pretty interesting framework for distributed matrix operations.
    • the ppt can be found here
  • trevor’s blog post has great pointers for getting started with a lot of the technologies on the fringes like  flink, mahot etc

kafka-spark integration

Am trying to get to a point whereby i will have  kafka + spark streaming running locally on my machine.

there are several things to figure out along the way:

  • kafka
  • scala
  • spark
  • spark-streaming



Distributed Locks

One reason why Redis has custom locking, instead of using operating system–level locks, language-level locks, and so forth, is a matter of scope. Clients want to have exclusive access to data stored on Redis, so clients need to have access to a lock defined in a scope that all clients can see—Redis.

Redis does have a basic sort of lock already available as part of the command set (SETNX), which we use, but it’s not full-featured and doesn’t offer advanced functionality that users would expect of a distributed lock.

In fact there are two patterns which have emerged for locking in Redis.

  1. Locking with SETNX
    • it’s not full-featured and doesn’t offer advanced functionality
  2. Redlock
    • the distributed locking algorithm.



Redis Internals



Data Types:

On Persistence:

Pub / Sub:


Redis Cluster Design and Specification



Scaling real time processing jobs in Azure

I recently was faced with an issue about how to scale real time processing jobs in Azure.

I finally managed to do it by using the concept of partitions.  Using partitions in EventHub along with Azure Stream Analytics got the job done for me.


802.3 v/s 803.11

This gives a nice overview of the differences between  Ethernet and Wifi at a protocol level.

  • The crux of the problem is this :  “The CSMA/CD protocol is not used in a wireless environment due to the user has no capability to sense/listen to the channel for collision while sending the packet [12].
  • This necessitates things like Collision Avoidance techniques to be used for Wifi.  And that imposes limits on how fast you can transmit packets at a certain frequency band leading to slower speeds.


Making REST calls to send data to an Azure EventHub

I recently encountered a situation where I had to use pure REST Calls to send data to an Azure Event Hub.


  • If you are used to using libraries (C#, Python) you will find that the libraries are doing a lot behind the scenes. Its not trivial to go from using the library to making pure REST calls
  • The first approach – using Fiddler to capture the traffic and re-purpose those calls – failed.
    • I am not sure why the calls fail to show up on fiddler. I tried out a few things like decrypt HTTPS and stuff. But I wasn’t able to get the sending traffic to show up on Fiddler
  • The references below give a good of how I made some progress.

REST Call to send data:

I finally got it to work with something like this:

User-Agent: Fiddler
Authorization: SharedAccessSignature
ContentType: application/atom+xml;type=entry;charset=utf-8
Content-Length: 153
Expect: 100-continue

{ "DeviceId" : "ArduinoYun",
  "SensorData" : [ { "SensorId" : "awk",
        "SensorType" : "temperature",
        "SensorValue" : 24.5
      } ]




Getting Started with Apache Kafka

Am doing some toe-dipping into Apache Kafka.



Commands from the Apache quick start documentation:

  • This gave me a good overview of what the system is doing.
> bin/ config/
> bin/ config/
> bin/ --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
> bin/ --list --zookeeper localhost:2181
> bin/ --broker-list localhost:9092 --topic test
> bin/ --zookeeper localhost:2181 --topic test --from-beginning
> cp config/ config/
> cp config/ config/
[Now edit these new files and set the following properties:
> bin/ config/ &
> bin/ config/ &
> bin/ --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic
> bin/ --describe --zookeeper localhost:2181 --topic my-replicated-topic
> bin/ --describe --zookeeper localhost:2181 --topic test
> bin/ --broker-list localhost:9092 --topic my-replicated-topic
> bin/ --zookeeper localhost:2181 --from-beginning --topic my-replicated-topic
> ps | grep
> kill -9 7564
[Leadership has switched to one of the slaves and node 1 is no longer in the in-sync replica set:]
> bin/ --describe --zookeeper localhost:2181 --topic my-replicated-topic
> bin/ --zookeeper localhost:2181 --from-beginning --topic my-replicated-topic