Comparing ML algos : Multi Armed bandit, Contextual Bandit, Logistic Regression, Online Learning

We have a system running Multi-Armed Bandit.

So when it came to select the next generation of ML algo to try out, we had a few choices:

  1. Multi-Armed Bandit  (we had this running)
    • This entails ranking the items based on their respective conversion rates till that point of time.
  2. Contextual Bandit
    • We use Vowpal Wabbit for this.
    • Internally Vowpal Wabbit treats contextual bandit in 2 distinct ways:
      • Without Action Dependent Features (non-ADF)
      • With Action Dependent Features (ADF)
    • Interestingly there is a difference between non-ADF and ADF modes.
      • In non-ADF mode, the VW creates multiple models (i.e. creates a model for each class).
      • In ADF mode, VW creates a single model.
  3. Logistic Regression.
    • This entails reducing the problem to a binary classification problem.
    • Then using the model to score the items. Finally ranking the items based on the model score.
  4. Online ML
    • Again treating this as a binary classification model, except this time we are updating the model in an online fashion.

 

Interestingly, on the dataset I was using I didn’t see much of a difference in algorithmic performance across the 4 different algorithms above.

algo_compare

 

Code:

_trials_compare3

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s