Prototypical Networks

Some nice links:

  1. N-Shot Learning: Learning More with Less Data
  1. Meta-Learning: Learning to Learn Fast
  1. Papers:



TensorFlow – save, restore, freeze models

It can be quite tricky getting around to deploying TF models –

In fact there are multiple ways to save/load TF models, with each serving a slightly different purpose / use-case.

  • simple_save
  • saver
  • estimator
  • keras
  • tflite





ML Algos in R

I have been trying to understand how ML algos in R fit together / compare with each other.

# R RevoScaleR MML Comment
1 lm rxLinMod  — Linear models
2 glm rxGlm  — Linear models
3 Glm w/

Binomial family and the logit link function

rxLogit rxLogisticRegression Logistic regression
4 rpart rxDtree  — Decision Trees implementations
5 gbm rxBTrees rxFastTrees Boosted Decision Tree implementations
6  —- rxDForest rxFastForest



# Title Links
1 Fitting Logistic Regression Models
2 Generalized Linear Models
3 rxDTree(): a new type of tree algorithm for big data
4 A first look at rxBTrees
5 A First Look at rxDForest()


Feature Scaling in SGD

SGD is the perfect algorithm for use in online learning. Except it has one major drawback – is sensitive to feature scaling.

In some of my trials with the SGD learner in scikit-learn, I have seen terrible performance if I don’t do feature scaling.

Which begs the question – How does VW do feature scaling ? After all VW does online learning.


It seems VW uses a kind of SGD that is scale variant:




L1 / L2 loss functions and regularization

There was a discussion that came up the other day about L1 v/s L2,  Lasso v/s Ridge etc.

In particular,

  • Whats the difference between L1 and L2  loss function
  • Whats the difference between L1 and L2  regularizers
  • Whats the difference between Lasso and Ridge