Bandits for Online Recommendations

October 17, 2016 abgoswam algorithms, machinelearning, math, statistics

I came across this interesting set of blog posts by Sergei Feldman on the use of bandit approaches in online recommendation.

In particular, the one I really enjoyed reading was the comparison of the approaches needed to solve the multi armed bandit problem. Need to play around with his code someday

References:

http://www.data-cowboys.com/

Balanced Ternary

October 4, 2016October 7, 2016 abgoswam algorithms, math, statistics

I was solving this math problem which had to do with representing every Natural number as a summation/subtraction of distinct power of 3

Interestingly this led me to this branch of mathematics called ‘Balanced Ternary’. Check it out!

Exploration of this problem gave me interesting insights about base representation of a number, something that I have been keeping in the backburner for a long while now. Finally got a chance to follow up on this.

References:

Problem:

peculiarbalance

Code:

balancedternary

Bandit Problems: an ‘Experiment Strategy’ or a ‘ML Algorithm’ ?

September 28, 2016October 17, 2016 abgoswam algorithms, machinelearning, math, statistics, systems, vowpalwabbit

Do a simple search on Google – ‘how do bandit algorithms work’ ?

Do the results look confusing ? Some links (here1, here2) say they are better than A/B. Then there are other links which say otherwise (here3, here4).

In fact, when one hears about Bandit Problems, there are couple of questions to think about:

Questions:

1.Is it an ‘Experiment Strategy’ ?

MAB gets compared with A/B tests. So is it an ‘experiment strategy’ like A/B testing ?

2. Is it an ‘ML Algorithm’ ?

Bandit algorithms select the most optimal ‘action’. So is it fundamentally an ML Algorithm ?
If yes, whats the relation between these ‘bandit problems’ v/s supervised ML algos like Logistic Regression and Decision Trees.

3. Where do the algorithms like epsilon-greedy, UCB etc fit into ?

Thoughts:

The correct way of looking at bandit problems is to think of it as an optimization problem for online interactive systems.
- The goal of bandit algorithms is to select the best policy that will maximize rewards. The space of policies is extremely large (or infinite)
In literature, people have treated bandit problems in different settings:
- Multi Armed Bandit setting
- Contextual Bandit
Multi Armed Bandit setting.
- In the MAB setting, there are a few known approaches for selecting the best policy.
  - Naive
  - Epsilon-Greedy
  - Upper Confidence Bounds.
Contextual Bandit.
- In one of my previous posts I noted the ML reduction stack in VW for the contextual bandits problem. In a separate post, I have also noted some thoughts on the use of the IPS score for conterfactual evaluation.
- In the Full Information Setting, the task of selecting the best policy is mapped to a cost-sensitive classification problem where:
  - context <-> example
  - action <-> label/class
  - policy <-> classifier
  - reward <-> gain / (negative) cost
- Thereby, we can use known supervised techniques like Decision Trees, Logistic Regression etc. to solve the cost-sensitive classification problem.
  - This was an interesting insight for me, and helped me answer the question #2 above
- In the Partial Information aka. Bandit setting, there would be two more issues we would like to handle
  - Filling in missing data.
  - Overcoming Bias.
The Partial Information aka. Bandit setting can further be looked into in 2 different ways:
- Online.
  - In the online setting the problem has been solved in different ways
  - Epsilon-Greedy / Epoch Greedy [Langford & Zhang].
  - “Monster” Algorithm [Dudik, Hsu, Kale, Langford]
  - They mostly vary in how they optimize regret. And/Or computational efficiency.
- Offline.
  - This is where Counterfactual evaluation and Learning comes in..
Bandit algorithms are not just an alternate ‘experiment strategy’ that is ‘better’ or ‘worse’ than A/B tests.
- The objectives behind doing an A/B test are different from the objectives of using a bandit system (which is to do continuous optimization).
Typic issues to consider for bandit problems:
- Explore-Exploit
  - exploit what has been learned
  - explore to learn which behaviour might give best results.
- Context
  - In the contextual setting (‘contextual bandit’) there are many more choice available. unlikely to see the same context twice.
- Selection bias
  - the exploit introduces bias that must be accounted for
- Efficiency.

References:

https://support.google.com/analytics/answer/2844870?hl=en&ref_topic=1745207
- This post gives a nice overview of how Google’s Analytics Content Experiment platform uses the multi-armed bandit approach for managing online experiments
https://www.youtube.com/watch?v=gzxRDw3lXv8
- Rob Schapire explains the fundamentals of the bandits problem. Very Cool!
http://www.cs.cornell.edu/~adith/CfactSIGIR2016/
- Tutorial by Thorsten Joachims on the bandits problem
http://engineering.richrelevance.com/bandits-recommendation-systems/
- Nice read!
http://stevehanov.ca/blog/index.php?id=132
https://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html
https://vwo.com/blog/multi-armed-bandit-algorithm/https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html

Debugging Standard Deviation

September 27, 2016 abgoswam pandas, statistics, visualization

In one of my previous posts, I had noted my thoughts around statistical measures like standard deviation and confidence intervals.

The fun part is of course when one has to debug these measures.

To that end I developed some insights by trying to visualize the data and plotting different kinds of charts using matplotlib

The code below also acts as a reference to one of the pet peeves I have when trying to plot data from a python dataframe.
Use the code below as reference going forward.

Also, sometimes you have to debug plots when they make no sense at all. Like this one below:

The first plot didnt make sense to me initially. But once I started debugging it made total sense.
Check the 2nd plot below which is what I get when I ‘sort’ the data

Code:

Confidence Intervals and Significance Levels

September 2, 2016September 6, 2016 abgoswam algorithms, math, statistics

In a previous post , I mentioned about expected value and variance of different distributions.

Taking the same statistical concepts further, we now want to compute confidence intervals for our estimate.

Note:

While thinking about Confidence Intervals, it is a good exercise to identify what distribution is representative of your estimate.
- The reason this is needed is because the confidence interval is dependent on standard deviation. As such, it would be necessary to know how you are computing your standard deviation.
- An alternative would be if we compute the variance from base principles.

capture

(https://en.wikipedia.org/wiki/Standard_deviation)

Here are a couple of very interesting post that explains the relationship between confidence intervals, statistical levels and P-values in a very simple way.

CTR: Here are some interesting discussions around computing confidence intervals for a metric like CTR

References:

Videos:

https://www.youtube.com/watch?v=tFWsuO9f74o
- 2 videos here are helpful in understanding the basic concept

Code:

evaluator_ips

Coupon collector’s problem

August 8, 2016 abgoswam math, statistics

References:

https://en.wikipedia.org/wiki/Coupon_collector%27s_problem

Expected Value and Variances

August 8, 2016September 6, 2016 abgoswam algorithms, math, statistics

There are 2 fundamental quantities of probability distributions: expected value and variance.

Expected value:

The simplest and most useful summary of the distribution of a random variable is the “average” of the values it takes on.
(Please see references for equation)

Variance :

The variance is a measure of how broadly distributed the r.v. tends to be.
It’s defined as the expectation of the squared deviation from the mean:
- Var(X) = E[(X − E(X))2 ]
In general terms, it is the expected squared distance of a value from the mean.

Looking at different distributions presents an interesting take on these two quantities:

Bernoulli Distribution
Uniform Distribution
Geometric Distribution
Binomial Distribution
Normal Distribution
Hypergeometric Distribution
Poisson Distribution

References:

http://idiom.ucsd.edu/~rlevy/teaching/fall2008/lign251/lectures/lecture_3.pdf
http://www.dma.unifi.it/~modica/2009-10/an1/appendici-Cormen.pdf
http://terras-altas.net.br/MA-2013/statistics/probability%20distribution%20functions/Examples%20of%20Bernouilli%20distribution.pdf
- This link has nice examples of several distributions
http://people.umass.edu/biep540w/pdf/bernoulli.pdf
http://www.math.uah.edu/stat/interval/Bernoulli.html

Video:

https://www.khanacademy.org/math/statistics-probability/sampling-distributions-library/sample-proportions/v/mean-and-variance-of-bernoulli-distribution-example
- The 2 videos in this link discusses the mean and variance of a bernoulli distribution

Code:

evaluator_ips

abgoswam's tech blog

Data Science, Machine Learning, CS Theory, Systems & Web

statistics

Bandits for Online Recommendations

Balanced Ternary

Bandit Problems: an ‘Experiment Strategy’ or a ‘ML Algorithm’ ?

Debugging Standard Deviation

Confidence Intervals and Significance Levels

Coupon collector’s problem

Expected Value and Variances