Prototypical Networks

Some nice links:

  1. N-Shot Learning: Learning More with Less Data
  1. Meta-Learning: Learning to Learn Fast
  1. Papers:

 

 

TensorFlow – save, restore, freeze models

It can be quite tricky getting around to deploying TF models –

In fact there are multiple ways to save/load TF models, with each serving a slightly different purpose / use-case.

  • simple_save
  • saver
  • estimator
  • keras
  • tflite

 

References:

Code:

 

ML Algos in R

I have been trying to understand how ML algos in R fit together / compare with each other.

# R RevoScaleR MML Comment
1 lm rxLinMod  — Linear models
2 glm rxGlm  — Linear models
3 Glm w/

Binomial family and the logit link function

rxLogit rxLogisticRegression Logistic regression
4 rpart rxDtree  — Decision Trees implementations
5 gbm rxBTrees rxFastTrees Boosted Decision Tree implementations
6  —- rxDForest rxFastForest

 

References:

# Title Links
1 Fitting Logistic Regression Models https://msdn.microsoft.com/en-us/microsoft-r/scaler-user-guide-logistic-regression
2 Generalized Linear Models https://msdn.microsoft.com/en-us/microsoft-r/scaler-user-guide-generalized-linear-mode
3 rxDTree(): a new type of tree algorithm for big data http://blog.revolutionanalytics.com/2013/07/rxdtree-a-new-type-of-tree-algorithm.html
4 A first look at rxBTrees http://blog.revolutionanalytics.com/2015/03/a-first-look-at-rxbtrees.html
5 A First Look at rxDForest() http://blog.revolutionanalytics.com/2014/01/a-first-look-at-rxdforest.html

 

Feature Scaling in SGD

SGD is the perfect algorithm for use in online learning. Except it has one major drawback – is sensitive to feature scaling.

In some of my trials with the SGD learner in scikit-learn, I have seen terrible performance if I don’t do feature scaling.

Which begs the question – How does VW do feature scaling ? After all VW does online learning.

Tip:

It seems VW uses a kind of SGD that is scale variant:

References:

Code:

 

L1 / L2 loss functions and regularization

There was a discussion that came up the other day about L1 v/s L2,  Lasso v/s Ridge etc.

In particular,

  • Whats the difference between L1 and L2  loss function
  • Whats the difference between L1 and L2  regularizers
  • Whats the difference between Lasso and Ridge

 

References: