We have a system running Multi-Armed Bandit.
So when it came to select the next generation of ML algo to try out, we had a few choices:
- Multi-Armed Bandit (we had this running)
- This entails ranking the items based on their respective conversion rates till that point of time.
- Contextual Bandit
- We use Vowpal Wabbit for this.
- Internally Vowpal Wabbit treats contextual bandit in 2 distinct ways:
- Without Action Dependent Features (non-ADF)
- With Action Dependent Features (ADF)
- Interestingly there is a difference between non-ADF and ADF modes.
- In non-ADF mode, the VW creates multiple models (i.e. creates a model for each class).
- In ADF mode, VW creates a single model.
- Logistic Regression.
- This entails reducing the problem to a binary classification problem.
- Then using the model to score the items. Finally ranking the items based on the model score.
- Online ML
- Again treating this as a binary classification model, except this time we are updating the model in an online fashion.
Interestingly, on the dataset I was using I didn’t see much of a difference in algorithmic performance across the 4 different algorithms above.