Some nice links:

- N-Shot Learning: Learning More with Less Data

- Meta-Learning: Learning to Learn Fast

- Papers:

Skip to content
# abgoswam's tech blog

## Data Science, Machine Learning, CS Theory, Systems & Web

#
machinelearning

# Prototypical Networks

# BERT Trials

# TensorFlow – save, restore, freeze models

# ML Algos in R

## References:

# Feature Scaling in SGD

# Software Packages for Graphical Models

# L1 / L2 loss functions and regularization

Some nice links:

- N-Shot Learning: Learning More with Less Data

- Meta-Learning: Learning to Learn Fast

- Papers:

A very simple and up-to-date explanation of BERT

It can be quite tricky getting around to deploying TF models –

In fact there are multiple ways to save/load TF models, with each serving a slightly different purpose / use-case.

- simple_save
- saver
- estimator
- keras
- tflite

**References:**

- https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc
- https://gist.github.com/omimo/5d393ed5b64d2ca0c591e4da04af6009

**Code:**

I have been trying to understand how ML algos in R fit together / compare with each other.

# |
R |
RevoScaleR |
MML |
Comment |

1 | lm | rxLinMod | — | Linear models |

2 | glm | rxGlm | — | Linear models |

3 | Glm w/
Binomial family and the logit link function |
rxLogit | rxLogisticRegression | Logistic regression |

4 | rpart | rxDtree | — | Decision Trees implementations |

5 | gbm | rxBTrees | rxFastTrees | Boosted Decision Tree implementations |

6 | —- | rxDForest | rxFastForest |

# |
Title |
Links |

1 | Fitting Logistic Regression Models | https://msdn.microsoft.com/en-us/microsoft-r/scaler-user-guide-logistic-regression |

2 | Generalized Linear Models | https://msdn.microsoft.com/en-us/microsoft-r/scaler-user-guide-generalized-linear-mode |

3 | rxDTree(): a new type of tree algorithm for big data | http://blog.revolutionanalytics.com/2013/07/rxdtree-a-new-type-of-tree-algorithm.html |

4 | A first look at rxBTrees | http://blog.revolutionanalytics.com/2015/03/a-first-look-at-rxbtrees.html |

5 | A First Look at rxDForest() | http://blog.revolutionanalytics.com/2014/01/a-first-look-at-rxdforest.html |

SGD is the perfect algorithm for use in online learning. Except it has one major drawback – is sensitive to feature scaling.

In some of my trials with the SGD learner in scikit-learn, I have seen terrible performance if I don’t do feature scaling.

Which begs the question – How does VW do feature scaling ? After all VW does online learning.

**Tip:**

It seems VW uses a kind of SGD that is scale variant:

**References:**

**Code:**

This is a nice link which lists the packages out there for Grpahical Models.

There was a discussion that came up the other day about L1 v/s L2, Lasso v/s Ridge etc.

In particular,

- Whats the difference between L1 and L2 loss function
- Whats the difference between L1 and L2 regularizers
- Whats the difference between Lasso and Ridge

**References:**

- [Differences between L1 and L2 as Loss Function and Regularization](http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/)
- Good read

- https://msdn.microsoft.com/en-us/magazine/dn904675.aspx
- Good read

- https://discuss.analyticsvidhya.com/t/difference-between-ridge-regression-and-lasso-and-its-effect/3000
- http://stats.stackexchange.com/questions/866/when-should-i-use-lasso-vs-ridge
- http://statweb.stanford.edu/~tibs/sta305files/Rudyregularization.pdf
- http://cseweb.ucsd.edu/~elkan/254spring05/Hammon.pdf