Bentley’s Treatment of the Maximum Sum Subsequence Problem

I thoroughly enjoyed the treatment given to this problem by Jon Bentley.

In particular, the section on “Principles” was such an exciting read – it shows some of the tricks used in algorithm design :

  • Save state to avoid recomputation
  • Preprocess state into data structures 
  • Divide and conquer algorithms
  • Scanning algorithms
  • Cumulatives 
  • Lower bounds
  • Sorting (my humble addition to the list)


K-means Clustering

One of my friends recently asked me about the K-means algorithm.

  • how does it work ?
  • what are the typical applications i.e. where/how is it used in the industry ?

In the discussion that followed, we ended up playing around with several visualizations available that do an awesome job of explaining this technique.

we also hacked around with some code from joel grus’  book (data science from scratch) to develop more intuition on the K-means algorithm.




There are some very interesting insights which we got playing around with trying to use K-means to cluster an image containing different colors:



Balanced Ternary

I was solving this math problem which had to do with representing every Natural number as a summation/subtraction of distinct power of 3

Interestingly this led me to this branch of mathematics called ‘Balanced Ternary’. Check it out!

Exploration of this problem gave me interesting insights about base representation of a number, something that I have been keeping in the backburner for a long while now. Finally got a chance to follow up on this.





Bandit Problems: an ‘Experiment Strategy’ or a ‘ML Algorithm’ ?

Do a simple search on Google –  ‘how do bandit algorithms work’ ?

Do the results look confusing ?  Some links (here1, here2) say they are better than A/B.  Then there are other links which say otherwise (here3, here4).

In fact, when one hears about Bandit Problems, there are couple of questions to think about:


1.Is it an ‘Experiment Strategy’ ?

  • MAB gets compared with A/B tests. So is it an ‘experiment strategy’ like A/B testing ?

2. Is it an ‘ML Algorithm’ ?

  • Bandit algorithms select the most optimal ‘action’. So is it fundamentally an ML Algorithm ?
  • If yes, whats the relation between these ‘bandit problems’ v/s supervised ML algos like Logistic Regression and Decision Trees.

3. Where do the algorithms like epsilon-greedy, UCB etc fit into ?



  • The correct way of looking at bandit problems is to think of it as an optimization problem for online interactive systems.
    • The goal of bandit algorithms is to select the best policy that will maximize rewards. The space of policies is extremely large (or infinite)
  • In literature, people have treated bandit problems in different settings:
    • Multi Armed Bandit setting
    • Contextual Bandit
  • Multi Armed Bandit setting.
    • In the MAB setting,  there are a few known approaches for selecting the best policy.
      • Naive
      • Epsilon-Greedy
      • Upper Confidence Bounds.
  • Contextual Bandit.
    • In one of my previous posts I  noted the ML reduction stack in VW for the contextual bandits problem. In a separate post, I have also noted some thoughts on the use of the IPS score for conterfactual evaluation.
    • In the Full Information Setting, the task of selecting the best policy is mapped to a cost-sensitive classification problem where:
      • context <-> example
      • action <-> label/class
      • policy <-> classifier
      • reward <-> gain / (negative) cost
    • Thereby, we can use known supervised techniques like Decision Trees, Logistic Regression etc. to solve the cost-sensitive classification problem.
      • This was an interesting insight for me, and helped me answer the question #2 above
    • In the Partial Information aka. Bandit setting, there would be two more issues we would like to handle
      • Filling in missing data.
      • Overcoming Bias.
  • The Partial Information aka. Bandit setting can further be looked into in 2 different ways:
    • Online.
      • In the online setting the problem has been solved in different ways
      • Epsilon-Greedy / Epoch Greedy [Langford & Zhang].
      • “Monster” Algorithm [Dudik, Hsu, Kale, Langford]
      • They mostly vary in how they optimize regret. And/Or computational efficiency.
    • Offline.
      • This is where Counterfactual evaluation and Learning comes in..
  • Bandit algorithms are not just an alternate ‘experiment strategy’ that is  ‘better’ or ‘worse’ than A/B tests.
    • The objectives behind doing an A/B test are different from the objectives of using a bandit system (which is to do continuous optimization).
  • Typic issues to consider for bandit problems:
    • Explore-Exploit
      • exploit what has been learned
      • explore to learn which behaviour might give best results.
    • Context
      • In the contextual setting (‘contextual bandit’) there are many more choice available. unlikely to see the same context twice.
    • Selection bias
      • the exploit introduces bias that must be accounted for
    • Efficiency.