Techniques for Approximate Counting

In many scenarios,  we can live with approximations in counting.

There are a few techniques in particular which are increasingly being used in online scenarios. Example, services like RedShift, Redis and Spark have in built support for these data structures.

  1. Hyper Log Log.
  2. Count Min Sketch.
  3. Bloom Filters.
  4. Locality Sensitive Hashing.

References:

Code:

Advertisements

An Activity Selection Problem

There is a very interesting problem I recently came across as part of Google’s foobar challenge. The problem is called ‘zombie antidote’.

I wanted to write down some thoughts as I made my way towards solving this problem.

Thoughts:

  • The problem smells like Dynamic Programming.
  • However, there are some challenges in coming up with the DP solution
    • How to build up the optimal substructure (aka. recurrence relation) for the problem ?
    • How to show overlapping subproblem property ?
  • It is useful to look at problems from different perspectives.
    • E.g. Can the activity selection problem be formulated as a Graph theory problem ?

Tips:

  • Refresh basics of Dynamic Programming Algos.
    • Especially useful here was understanding the cut-rod problem from Cormen’s algo book
  • Understanding the relationship between the following 3 fundamental algorithmic approaches.
    • Dynamic Programming
    • Greedy Algorithms
    • Graph Theory
  • As it turns out, the activity selection problem can indeed be formulated in all of the 3 algo approaches listed above.

Code:

References:

  1. http://www.radford.edu/~nokie/classes/360/dp-rod-cutting.html
  2. http://www.geeksforgeeks.org/dynamic-programming-set-13-cutting-a-rod/
  3. http://www.davebrundage.com/scheduling-problem-zombit-antidote/
  4. https://getpocket.com/a/read/808088510

Multi-dimensional (axial) data handling in Python

Recently I was playing around with multi-dimensional data structures in Python.

Some interesting observations:

  1. Multi-dimensional lists and multi-dimensional arrays are fundamentally handled differently.
  2. Slicing of multi-dimensional arrays (numpy) need to be carefully considered in regards to shallow copy etc

 

Some references below for further examination:

References:

  1. http://ilan.schnell-web.net/prog/slicing/
  2. https://docs.python.org/2/library/copy.html
  3. http://stackoverflow.com/questions/509211/explain-pythons-slice-notation
  4. http://cs231n.github.io/python-numpy-tutorial/
  5. http://www.physics.nyu.edu/pine/pymanual/html/chap3/chap3_arrays.html
  6. http://www.astro.ufl.edu/~warner/prog/python.html