Consistent Hashing

August 22, 2018August 22, 2018 abgoswam algorithms, bigdata

References:

https://stackoverflow.com/questions/144360/simple-basic-explanation-of-a-distributed-hash-table-dht
http://michaelnielsen.org/blog/consistent-hashing/
http://michaelnielsen.org/blog/lecture-course-the-google-technology-stack/
- this is wonderful reading on Google’s technology stack!

Video Lectures on Discrete Mathematics and Algorithms

June 3, 2018June 14, 2018 abgoswam algorithms, math

Stony Brook [my Alma Mater 🙂 ]

IMSc (Indian Institute of Mathematical Sciences):

https://www.youtube.com/watch?v=Lv3o4OOHG_w
https://www.youtube.com/watch?v=fH0V-sQrBqE
- Check out the channel! Some very cool videos

MIT

Finding two largest elements in an array

May 29, 2018June 9, 2018 abgoswam algorithms

This is a very interesting problem indeed.

The problem comes in multiple flavors:

Find two largest elements in an array in 1.5n comparisons.
Find two largest elements in an array in n+log(n)−2 comparisons.
Prove that no algorithm for finding two largest elements in an array can do this in less than n+ log (n)−2 comparisons.
What is the fastest algorithm for finding three largest elements?

References:

Code:

max_pairwise_product.py

Alias method

February 28, 2017August 6, 2018 abgoswam algorithms, math

“You are given an n-sided die where side i has probability p_i of being rolled. What is the most efficient data structure for simulating rolls of the die?”

A very similar question was posted to me recently :

The approaches above are very cool, and illustrate the use of augmented search trees.

However, it seems there is a better method for this problem – and it has been out there for a while now. This was a fascinating read :

Darts, Dice, and Coins: Sampling from a Discrete Distribution

Additional pointers for the alias method:

Bentley’s Treatment of the Maximum Sum Subsequence Problem

January 17, 2017 abgoswam algorithms

I thoroughly enjoyed the treatment given to this problem by Jon Bentley.

In particular, the section on “Principles” was such an exciting read – it shows some of the tricks used in algorithm design :

Save state to avoid recomputation
Preprocess state into data structures
Divide and conquer algorithms
Scanning algorithms
Cumulatives
Lower bounds
Sorting (my humble addition to the list)

References:

http://wordaligned.org/articles/the-maximum-subsequence-problem
Programming Pearls E-Book
Programming pearls: algorithm design techniques by Bentley
Programming pearls: abstract data types by Bentley
Fast Algorithms for Sorting and Searching Strings by Bentley

Assignment Problem – Hungarian Algorithm

December 25, 2016 abgoswam algorithms

References:

Bandits for Online Recommendations

October 17, 2016 abgoswam algorithms, machinelearning, math, statistics

I came across this interesting set of blog posts by Sergei Feldman on the use of bandit approaches in online recommendation.

In particular, the one I really enjoyed reading was the comparison of the approaches needed to solve the multi armed bandit problem. Need to play around with his code someday

References:

http://www.data-cowboys.com/

K-means Clustering

October 8, 2016October 8, 2016 abgoswam algorithms, machinelearning, python

One of my friends recently asked me about the K-means algorithm.

how does it work ?
what are the typical applications i.e. where/how is it used in the industry ?

In the discussion that followed, we ended up playing around with several visualizations available that do an awesome job of explaining this technique.

we also hacked around with some code from joel grus’ book (data science from scratch) to develop more intuition on the K-means algorithm.

Visualizations:

Code:

There are some very interesting insights which we got playing around with trying to use K-means to cluster an image containing different colors:

sample5

Balanced Ternary

October 4, 2016October 7, 2016 abgoswam algorithms, math, statistics

I was solving this math problem which had to do with representing every Natural number as a summation/subtraction of distinct power of 3

Interestingly this led me to this branch of mathematics called ‘Balanced Ternary’. Check it out!

Exploration of this problem gave me interesting insights about base representation of a number, something that I have been keeping in the backburner for a long while now. Finally got a chance to follow up on this.

References:

Problem:

peculiarbalance

Code:

balancedternary

Bandit Problems: an ‘Experiment Strategy’ or a ‘ML Algorithm’ ?

September 28, 2016October 17, 2016 abgoswam algorithms, machinelearning, math, statistics, systems, vowpalwabbit

Do a simple search on Google – ‘how do bandit algorithms work’ ?

Do the results look confusing ? Some links (here1, here2) say they are better than A/B. Then there are other links which say otherwise (here3, here4).

In fact, when one hears about Bandit Problems, there are couple of questions to think about:

Questions:

1.Is it an ‘Experiment Strategy’ ?

MAB gets compared with A/B tests. So is it an ‘experiment strategy’ like A/B testing ?

2. Is it an ‘ML Algorithm’ ?

Bandit algorithms select the most optimal ‘action’. So is it fundamentally an ML Algorithm ?
If yes, whats the relation between these ‘bandit problems’ v/s supervised ML algos like Logistic Regression and Decision Trees.

3. Where do the algorithms like epsilon-greedy, UCB etc fit into ?

Thoughts:

The correct way of looking at bandit problems is to think of it as an optimization problem for online interactive systems.
- The goal of bandit algorithms is to select the best policy that will maximize rewards. The space of policies is extremely large (or infinite)
In literature, people have treated bandit problems in different settings:
- Multi Armed Bandit setting
- Contextual Bandit
Multi Armed Bandit setting.
- In the MAB setting, there are a few known approaches for selecting the best policy.
  - Naive
  - Epsilon-Greedy
  - Upper Confidence Bounds.
Contextual Bandit.
- In one of my previous posts I noted the ML reduction stack in VW for the contextual bandits problem. In a separate post, I have also noted some thoughts on the use of the IPS score for conterfactual evaluation.
- In the Full Information Setting, the task of selecting the best policy is mapped to a cost-sensitive classification problem where:
  - context <-> example
  - action <-> label/class
  - policy <-> classifier
  - reward <-> gain / (negative) cost
- Thereby, we can use known supervised techniques like Decision Trees, Logistic Regression etc. to solve the cost-sensitive classification problem.
  - This was an interesting insight for me, and helped me answer the question #2 above
- In the Partial Information aka. Bandit setting, there would be two more issues we would like to handle
  - Filling in missing data.
  - Overcoming Bias.
The Partial Information aka. Bandit setting can further be looked into in 2 different ways:
- Online.
  - In the online setting the problem has been solved in different ways
  - Epsilon-Greedy / Epoch Greedy [Langford & Zhang].
  - “Monster” Algorithm [Dudik, Hsu, Kale, Langford]
  - They mostly vary in how they optimize regret. And/Or computational efficiency.
- Offline.
  - This is where Counterfactual evaluation and Learning comes in..
Bandit algorithms are not just an alternate ‘experiment strategy’ that is ‘better’ or ‘worse’ than A/B tests.
- The objectives behind doing an A/B test are different from the objectives of using a bandit system (which is to do continuous optimization).
Typic issues to consider for bandit problems:
- Explore-Exploit
  - exploit what has been learned
  - explore to learn which behaviour might give best results.
- Context
  - In the contextual setting (‘contextual bandit’) there are many more choice available. unlikely to see the same context twice.
- Selection bias
  - the exploit introduces bias that must be accounted for
- Efficiency.

References:

https://support.google.com/analytics/answer/2844870?hl=en&ref_topic=1745207
- This post gives a nice overview of how Google’s Analytics Content Experiment platform uses the multi-armed bandit approach for managing online experiments
https://www.youtube.com/watch?v=gzxRDw3lXv8
- Rob Schapire explains the fundamentals of the bandits problem. Very Cool!
http://www.cs.cornell.edu/~adith/CfactSIGIR2016/
- Tutorial by Thorsten Joachims on the bandits problem
http://engineering.richrelevance.com/bandits-recommendation-systems/
- Nice read!
http://stevehanov.ca/blog/index.php?id=132
https://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html
https://vwo.com/blog/multi-armed-bandit-algorithm/https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html

abgoswam's tech blog

Data Science, Machine Learning, CS Theory, Systems & Web

algorithms

Consistent Hashing

Video Lectures on Discrete Mathematics and Algorithms

Finding two largest elements in an array

Alias method

Darts, Dice, and Coins: Sampling from a Discrete Distribution

Bentley’s Treatment of the Maximum Sum Subsequence Problem

Assignment Problem – Hungarian Algorithm

Bandits for Online Recommendations

K-means Clustering

Balanced Ternary

Bandit Problems: an ‘Experiment Strategy’ or a ‘ML Algorithm’ ?