kafka-spark integration

Am trying to get to a point whereby i will have  kafka + spark streaming running locally on my machine.

there are several things to figure out along the way:

  • kafka
  • scala
  • spark
  • spark-streaming




Distributed Locks

One reason why Redis has custom locking, instead of using operating system–level locks, language-level locks, and so forth, is a matter of scope. Clients want to have exclusive access to data stored on Redis, so clients need to have access to a lock defined in a scope that all clients can see—Redis.

Redis does have a basic sort of lock already available as part of the command set (SETNX), which we use, but it’s not full-featured and doesn’t offer advanced functionality that users would expect of a distributed lock.

In fact there are two patterns which have emerged for locking in Redis.

  1. Locking with SETNX
    • it’s not full-featured and doesn’t offer advanced functionality
  2. Redlock
    • the distributed locking algorithm.



Redis Internals



Data Types:

On Persistence:

Pub / Sub:


Redis Cluster Design and Specification




Was playing around with IntelliJ.

Specifically, was investigating how to do the following in IntelliJ:

  • Basic Java development
  • Basic Scala development
  • SBT for Scala
  • Maven for Java
  • Scala Tests

Basic java development:

Maven for Java:



One of the things i realized is – its critical to understand the basic. Specifically:


Scaling real time processing jobs in Azure

I recently was faced with an issue about how to scale real time processing jobs in Azure.

I finally managed to do it by using the concept of partitions.  Using partitions in EventHub along with Azure Stream Analytics got the job done for me.