# Bandits for Online Recommendations

I came across this interesting set of blog posts by Sergei Feldman on the use of bandit approaches in online recommendation.

In particular, the one I really enjoyed reading was the comparison of the approaches needed to solve the multi armed bandit problem. Need to play around with his code someday

References:

Advertisements

# Balanced Ternary

I was solving this math problem which had to do with representing every Natural number as a summation/subtraction of distinct power of 3

Interestingly this led me to this branch of mathematics called ‘Balanced Ternary’. Check it out!

Exploration of this problem gave me interesting insights about base representation of a number, something that I have been keeping in the backburner for a long while now. Finally got a chance to follow up on this.

References:

Problem:

Code:

# Bandit Problems: an ‘Experiment Strategy’ or a ‘ML Algorithm’ ?

Do a simple search on Google –  ‘how do bandit algorithms work’ ?

Do the results look confusing ?  Some links (here1, here2) say they are better than A/B.  Then there are other links which say otherwise (here3, here4).

In fact, when one hears about Bandit Problems, there are couple of questions to think about:

Questions:

1.Is it an ‘Experiment Strategy’ ?

• MAB gets compared with A/B tests. So is it an ‘experiment strategy’ like A/B testing ?

2. Is it an ‘ML Algorithm’ ?

• Bandit algorithms select the most optimal ‘action’. So is it fundamentally an ML Algorithm ?
• If yes, whats the relation between these ‘bandit problems’ v/s supervised ML algos like Logistic Regression and Decision Trees.

3. Where do the algorithms like epsilon-greedy, UCB etc fit into ?

Thoughts:

• The correct way of looking at bandit problems is to think of it as an optimization problem for online interactive systems.
• The goal of bandit algorithms is to select the best policy that will maximize rewards. The space of policies is extremely large (or infinite)
• In literature, people have treated bandit problems in different settings:
• Multi Armed Bandit setting
• Contextual Bandit
• Multi Armed Bandit setting.
• In the MAB setting,  there are a few known approaches for selecting the best policy.
• Naive
• Epsilon-Greedy
• Upper Confidence Bounds.
• Contextual Bandit.
• In one of my previous posts I  noted the ML reduction stack in VW for the contextual bandits problem. In a separate post, I have also noted some thoughts on the use of the IPS score for conterfactual evaluation.
• In the Full Information Setting, the task of selecting the best policy is mapped to a cost-sensitive classification problem where:
• context <-> example
• action <-> label/class
• policy <-> classifier
• reward <-> gain / (negative) cost
• Thereby, we can use known supervised techniques like Decision Trees, Logistic Regression etc. to solve the cost-sensitive classification problem.
• This was an interesting insight for me, and helped me answer the question #2 above
• In the Partial Information aka. Bandit setting, there would be two more issues we would like to handle
• Filling in missing data.
• Overcoming Bias.
• The Partial Information aka. Bandit setting can further be looked into in 2 different ways:
• Online.
• In the online setting the problem has been solved in different ways
• Epsilon-Greedy / Epoch Greedy [Langford & Zhang].
• “Monster” Algorithm [Dudik, Hsu, Kale, Langford]
• They mostly vary in how they optimize regret. And/Or computational efficiency.
• Offline.
• This is where Counterfactual evaluation and Learning comes in..
• Bandit algorithms are not just an alternate ‘experiment strategy’ that is  ‘better’ or ‘worse’ than A/B tests.
• The objectives behind doing an A/B test are different from the objectives of using a bandit system (which is to do continuous optimization).
• Typic issues to consider for bandit problems:
• Explore-Exploit
• exploit what has been learned
• explore to learn which behaviour might give best results.
• Context
• In the contextual setting (‘contextual bandit’) there are many more choice available. unlikely to see the same context twice.
• Selection bias
• the exploit introduces bias that must be accounted for
• Efficiency.

References:

# Debugging Standard Deviation

In one of my previous posts, I had noted my thoughts around statistical measures like standard deviation and confidence intervals.

The fun part is of course when one has to debug these measures.

To that end I developed some insights by trying to visualize the data and plotting different kinds of charts using matplotlib

• The code below also acts as a reference to one of the pet peeves I have when trying to plot data from a python dataframe.
• Use the code below as reference going forward. Also, sometimes you have to debug plots when they make no sense at all. Like this one below:

• The first plot didnt make sense to me initially. But once I started debugging it made total sense.
• Check the 2nd plot below which is what I get when I ‘sort’ the data Code:

# Confidence Intervals and Significance Levels

In a previous post , I mentioned about  expected value and variance of different distributions.

Taking the same statistical concepts further, we now want to compute confidence intervals for our estimate.

Note:

• While thinking about Confidence Intervals, it is a good exercise to identify what distribution is representative of your estimate.
• The reason this is needed is because the confidence interval  is dependent on standard deviation.  As such, it would be necessary to know how you are computing your standard deviation.
• An alternative would be if we compute the variance from base principles. • Here are a couple of very interesting post that explains the relationship between confidence intervals, statistical levels and P-values in a very simple way.

• CTR: Here are some interesting discussions around computing confidence intervals for a metric like CTR

References:

Videos:

Code:

# Expected Value and Variances

There are 2 fundamental quantities of probability distributions: expected value and variance.

Expected value:

• The simplest and most useful summary of the distribution of a random variable is the “average” of the values it takes on.
• (Please see references for equation)

Variance :

• The variance is a measure of how broadly distributed the r.v. tends to be.
• It’s defined as the expectation of the squared deviation from the mean:
• Var(X) = E[(X − E(X))2 ]
• In general terms, it is the expected squared distance of a value from the mean.

Looking at different distributions presents an interesting take on these two quantities:

1. Bernoulli Distribution
2. Uniform Distribution
3. Geometric Distribution
4. Binomial Distribution
5. Normal Distribution
6. Hypergeometric Distribution
7. Poisson Distribution

References:

Video:

Code: