# K-means Clustering

• how does it work ?
• what are the typical applications i.e. where/how is it used in the industry ?

In the discussion that followed, we ended up playing around with several visualizations available that do an awesome job of explaining this technique.

we also hacked around with some code from joel grus’  book (data science from scratch) to develop more intuition on the K-means algorithm.

Visualizations:

Code:

There are some very interesting insights which we got playing around with trying to use K-means to cluster an image containing different colors:

# Balanced Ternary

I was solving this math problem which had to do with representing every Natural number as a summation/subtraction of distinct power of 3

Interestingly this led me to this branch of mathematics called ‘Balanced Ternary’. Check it out!

Exploration of this problem gave me interesting insights about base representation of a number, something that I have been keeping in the backburner for a long while now. Finally got a chance to follow up on this.

References:

Problem:

Code:

# Bandit Problems: an ‘Experiment Strategy’ or a ‘ML Algorithm’ ?

Do a simple search on Google –  ‘how do bandit algorithms work’ ?

Do the results look confusing ?  Some links (here1, here2) say they are better than A/B.  Then there are other links which say otherwise (here3, here4).

In fact, when one hears about Bandit Problems, there are couple of questions to think about:

Questions:

1.Is it an ‘Experiment Strategy’ ?

• MAB gets compared with A/B tests. So is it an ‘experiment strategy’ like A/B testing ?

2. Is it an ‘ML Algorithm’ ?

• Bandit algorithms select the most optimal ‘action’. So is it fundamentally an ML Algorithm ?
• If yes, whats the relation between these ‘bandit problems’ v/s supervised ML algos like Logistic Regression and Decision Trees.

3. Where do the algorithms like epsilon-greedy, UCB etc fit into ?

Thoughts:

• The correct way of looking at bandit problems is to think of it as an optimization problem for online interactive systems.
• The goal of bandit algorithms is to select the best policy that will maximize rewards. The space of policies is extremely large (or infinite)
• In literature, people have treated bandit problems in different settings:
• Multi Armed Bandit setting
• Contextual Bandit
• Multi Armed Bandit setting.
• In the MAB setting,  there are a few known approaches for selecting the best policy.
• Naive
• Epsilon-Greedy
• Upper Confidence Bounds.
• Contextual Bandit.
• In one of my previous posts I  noted the ML reduction stack in VW for the contextual bandits problem. In a separate post, I have also noted some thoughts on the use of the IPS score for conterfactual evaluation.
• In the Full Information Setting, the task of selecting the best policy is mapped to a cost-sensitive classification problem where:
• context <-> example
• action <-> label/class
• policy <-> classifier
• reward <-> gain / (negative) cost
• Thereby, we can use known supervised techniques like Decision Trees, Logistic Regression etc. to solve the cost-sensitive classification problem.
• This was an interesting insight for me, and helped me answer the question #2 above
• In the Partial Information aka. Bandit setting, there would be two more issues we would like to handle
• Filling in missing data.
• Overcoming Bias.
• The Partial Information aka. Bandit setting can further be looked into in 2 different ways:
• Online.
• In the online setting the problem has been solved in different ways
• Epsilon-Greedy / Epoch Greedy [Langford & Zhang].
• “Monster” Algorithm [Dudik, Hsu, Kale, Langford]
• They mostly vary in how they optimize regret. And/Or computational efficiency.
• Offline.
• This is where Counterfactual evaluation and Learning comes in..
• Bandit algorithms are not just an alternate ‘experiment strategy’ that is  ‘better’ or ‘worse’ than A/B tests.
• The objectives behind doing an A/B test are different from the objectives of using a bandit system (which is to do continuous optimization).
• Typic issues to consider for bandit problems:
• Explore-Exploit
• exploit what has been learned
• explore to learn which behaviour might give best results.
• Context
• In the contextual setting (‘contextual bandit’) there are many more choice available. unlikely to see the same context twice.
• Selection bias
• the exploit introduces bias that must be accounted for
• Efficiency.

References:

# Recursion: Tail v/s Linear

Merits of Tail Recursion v/s Linear Recursion.

[TBD]

References:

Code:

# Confidence Intervals and Significance Levels

In a previous post , I mentioned about  expected value and variance of different distributions.

Taking the same statistical concepts further, we now want to compute confidence intervals for our estimate.

Note:

• While thinking about Confidence Intervals, it is a good exercise to identify what distribution is representative of your estimate.
• The reason this is needed is because the confidence interval  is dependent on standard deviation.  As such, it would be necessary to know how you are computing your standard deviation.
• An alternative would be if we compute the variance from base principles.

• Here are a couple of very interesting post that explains the relationship between confidence intervals, statistical levels and P-values in a very simple way.

• CTR: Here are some interesting discussions around computing confidence intervals for a metric like CTR

References:

Videos:

Code:

# Expected Value and Variances

There are 2 fundamental quantities of probability distributions: expected value and variance.

Expected value:

• The simplest and most useful summary of the distribution of a random variable is the “average” of the values it takes on.
• (Please see references for equation)

Variance :

• The variance is a measure of how broadly distributed the r.v. tends to be.
• It’s defined as the expectation of the squared deviation from the mean:
• Var(X) = E[(X − E(X))2 ]
• In general terms, it is the expected squared distance of a value from the mean.

Looking at different distributions presents an interesting take on these two quantities:

1. Bernoulli Distribution
2. Uniform Distribution
3. Geometric Distribution
4. Binomial Distribution
5. Normal Distribution
6. Hypergeometric Distribution
7. Poisson Distribution

References:

Video:

Code: