Using Vowpal Wabbit : Tips and Tricks

As I play around more with the machine learning toolkit Vowpal Wabbit, I realized there are several subtle flags/functionality in the toolkit.

Here is an effort to aggregate my leanings in using this toolkit.

Tips:

  • -q [ –quadratic ] arg
    • Create and use quadratic features
    • -q is a very powerful option. It takes as an argument a pair of two letters. Its effect is to create interactions between the features of two namespaces. Suppose each example has a namespaceuser and a namespace document, then specifying -q ud will create an interaction feature for every pair of features (x,y) where x is a feature from the user namespace and y is a feature from the document namespace. If a letter matches more than one namespace then all the matching namespaces are used. In our example if there is another namespace url then interactions between url and document will also be modeled. The letter : is a wildcard to interact with all namespaces. -q a: (or -q :a) will create an interaction feature for every pair of features (x,y)where x is a feature from the namespaces starting with a and y is a feature from the all namespaces. -q :: would interact any combination of pairs of features.
  • –print
    • Use this to understand how VW constructs   the highlighted number of features.
    • When using contextual bandit mode, you will notice it gets added automatically per action
  • Feature ‘116060’
    • This is a constant feature with value 1, that essentially captures the intercept term in a linear model.
    • You may come across this feature if you look closely into the VW output.
  • Output Feature Ranking and Feature Weights Using VW
    • Is it possible to output the feature rankings after every update ?
      • try –audit
    • Is it possible to output the feature rankings at the end of training ?
      • use a combination of –-readable_model  foo and –l1 1e-3.  Any features surviving a high level of l1 regularization must be important according to the gradient.
    • Is it possible to output the feature weights after every update ?
      • possible, but this will be expensive.  In between each example you can put in a special examples with a ‘tag’ that says save_<filename>..
    • Is it possible to output the feature weights at the end of training ?
      •  that’s what –f does.  If you want it in a readable format then use –readable_model.
  • The learning rate schedule is influenced by 4 parameters, outlined here.
    • Look into the code below regarding how I am doing parameter sweeps over these 4 parameters.
    • capture
  • More to come…

 

References:

  1. https://github.com/datasciencedojo/meetup/tree/master/getting_started_with_vowpal_wabbit
  2. http://mlwave.com/tutorial-titanic-machine-learning-from-distaster/
  3. http://zinkov.com/posts/2013-08-13-vowpal-tutorial/
  4. http://stackoverflow.com/questions/24822288/correctness-of-logistic-regression-in-vowpal-wabbit

 

Code:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s