kafka-spark integration

Am trying to get to a point whereby i will have  kafka + spark streaming running locally on my machine.

there are several things to figure out along the way:

  • kafka
  • scala
  • spark
  • spark-streaming




Polyglot Persistence

I came across this term polyglot persistence  while reading a  post about ACID in the context of NoSQL databases.

The post  gave me insights  into the world of NoSQL  databases, and  how to think about ACID in the context  of NoSQL systems.

It has some great links and pointers. in particular some interesting pointers to Eric Brewers posts on highscalability.com

Check it out!

Distributed Locks

One reason why Redis has custom locking, instead of using operating system–level locks, language-level locks, and so forth, is a matter of scope. Clients want to have exclusive access to data stored on Redis, so clients need to have access to a lock defined in a scope that all clients can see—Redis.

Redis does have a basic sort of lock already available as part of the command set (SETNX), which we use, but it’s not full-featured and doesn’t offer advanced functionality that users would expect of a distributed lock.

In fact there are two patterns which have emerged for locking in Redis.

  1. Locking with SETNX
    • it’s not full-featured and doesn’t offer advanced functionality
  2. Redlock
    • the distributed locking algorithm.



Redis Internals



Data Types:

On Persistence:

Pub / Sub:


Redis Cluster Design and Specification




Was playing around with IntelliJ.

Specifically, was investigating how to do the following in IntelliJ:

  • Basic Java development
  • Basic Scala development
  • SBT for Scala
  • Maven for Java
  • Scala Tests

Basic java development:

Maven for Java:



One of the things i realized is – its critical to understand the basic. Specifically:


Scaling real time processing jobs in Azure

I recently was faced with an issue about how to scale real time processing jobs in Azure.

I finally managed to do it by using the concept of partitions.  Using partitions in EventHub along with Azure Stream Analytics got the job done for me.