Prototypical Networks

Some nice links:

  1. N-Shot Learning: Learning More with Less Data
  1. Meta-Learning: Learning to Learn Fast
  1. Papers:

 

 

TensorFlow – save, restore, freeze models

It can be quite tricky getting around to deploying TF models –

In fact there are multiple ways to save/load TF models, with each serving a slightly different purpose / use-case.

  • simple_save
  • saver
  • estimator
  • keras
  • tflite

 

References:

Code:

 

ML Algos in R

I have been trying to understand how ML algos in R fit together / compare with each other.

# R RevoScaleR MML Comment
1 lm rxLinMod  — Linear models
2 glm rxGlm  — Linear models
3 Glm w/

Binomial family and the logit link function

rxLogit rxLogisticRegression Logistic regression
4 rpart rxDtree  — Decision Trees implementations
5 gbm rxBTrees rxFastTrees Boosted Decision Tree implementations
6  —- rxDForest rxFastForest

 

References:

# Title Links
1 Fitting Logistic Regression Models https://msdn.microsoft.com/en-us/microsoft-r/scaler-user-guide-logistic-regression
2 Generalized Linear Models https://msdn.microsoft.com/en-us/microsoft-r/scaler-user-guide-generalized-linear-mode
3 rxDTree(): a new type of tree algorithm for big data http://blog.revolutionanalytics.com/2013/07/rxdtree-a-new-type-of-tree-algorithm.html
4 A first look at rxBTrees http://blog.revolutionanalytics.com/2015/03/a-first-look-at-rxbtrees.html
5 A First Look at rxDForest() http://blog.revolutionanalytics.com/2014/01/a-first-look-at-rxdforest.html

 

Feature Scaling in SGD

SGD is the perfect algorithm for use in online learning. Except it has one major drawback – is sensitive to feature scaling.

In some of my trials with the SGD learner in scikit-learn, I have seen terrible performance if I don’t do feature scaling.

Which begs the question – How does VW do feature scaling ? After all VW does online learning.

Tip:

It seems VW uses a kind of SGD that is scale variant:

References:

Code:

 

L1 / L2 loss functions and regularization

There was a discussion that came up the other day about L1 v/s L2,  Lasso v/s Ridge etc.

In particular,

  • Whats the difference between L1 and L2  loss function
  • Whats the difference between L1 and L2  regularizers
  • Whats the difference between Lasso and Ridge

 

References:

K-means Clustering

One of my friends recently asked me about the K-means algorithm.

  • how does it work ?
  • what are the typical applications i.e. where/how is it used in the industry ?

In the discussion that followed, we ended up playing around with several visualizations available that do an awesome job of explaining this technique.

we also hacked around with some code from joel grus’  book (data science from scratch) to develop more intuition on the K-means algorithm.

Visualizations:

Code:

 

There are some very interesting insights which we got playing around with trying to use K-means to cluster an image containing different colors:

sample5

 

Deep Learning Resources