Some nice links:

- N-Shot Learning: Learning More with Less Data

- Meta-Learning: Learning to Learn Fast

- Papers:

Skip to content
#
machinelearning

# Prototypical Networks

# BERT Trials

# TensorFlow – save, restore, freeze models

# ML Algos in R

## References:

# Feature Scaling in SGD

# Software Packages for Graphical Models

# L1 / L2 loss functions and regularization

# Bandits for Online Recommendations

# K-means Clustering

# Deep Learning Resources

Some nice links:

- N-Shot Learning: Learning More with Less Data

- Meta-Learning: Learning to Learn Fast

- Papers:

A very simple and up-to-date explanation of BERT

It can be quite tricky getting around to deploying TF models –

In fact there are multiple ways to save/load TF models, with each serving a slightly different purpose / use-case.

- simple_save
- saver
- estimator
- keras
- tflite

**References:**

- https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc
- https://gist.github.com/omimo/5d393ed5b64d2ca0c591e4da04af6009

**Code:**

I have been trying to understand how ML algos in R fit together / compare with each other.

# |
R |
RevoScaleR |
MML |
Comment |

1 | lm | rxLinMod | — | Linear models |

2 | glm | rxGlm | — | Linear models |

3 | Glm w/
Binomial family and the logit link function |
rxLogit | rxLogisticRegression | Logistic regression |

4 | rpart | rxDtree | — | Decision Trees implementations |

5 | gbm | rxBTrees | rxFastTrees | Boosted Decision Tree implementations |

6 | —- | rxDForest | rxFastForest |

# |
Title |
Links |

1 | Fitting Logistic Regression Models | https://msdn.microsoft.com/en-us/microsoft-r/scaler-user-guide-logistic-regression |

2 | Generalized Linear Models | https://msdn.microsoft.com/en-us/microsoft-r/scaler-user-guide-generalized-linear-mode |

3 | rxDTree(): a new type of tree algorithm for big data | http://blog.revolutionanalytics.com/2013/07/rxdtree-a-new-type-of-tree-algorithm.html |

4 | A first look at rxBTrees | http://blog.revolutionanalytics.com/2015/03/a-first-look-at-rxbtrees.html |

5 | A First Look at rxDForest() | http://blog.revolutionanalytics.com/2014/01/a-first-look-at-rxdforest.html |

SGD is the perfect algorithm for use in online learning. Except it has one major drawback – is sensitive to feature scaling.

In some of my trials with the SGD learner in scikit-learn, I have seen terrible performance if I don’t do feature scaling.

Which begs the question – How does VW do feature scaling ? After all VW does online learning.

**Tip:**

It seems VW uses a kind of SGD that is scale variant:

**References:**

**Code:**

This is a nice link which lists the packages out there for Grpahical Models.

There was a discussion that came up the other day about L1 v/s L2, Lasso v/s Ridge etc.

In particular,

- Whats the difference between L1 and L2 loss function
- Whats the difference between L1 and L2 regularizers
- Whats the difference between Lasso and Ridge

**References:**

- [Differences between L1 and L2 as Loss Function and Regularization](http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/)
- Good read

- https://msdn.microsoft.com/en-us/magazine/dn904675.aspx
- Good read

- https://discuss.analyticsvidhya.com/t/difference-between-ridge-regression-and-lasso-and-its-effect/3000
- http://stats.stackexchange.com/questions/866/when-should-i-use-lasso-vs-ridge
- http://statweb.stanford.edu/~tibs/sta305files/Rudyregularization.pdf
- http://cseweb.ucsd.edu/~elkan/254spring05/Hammon.pdf

I came across this interesting set of blog posts by Sergei Feldman on the use of bandit approaches in online recommendation.

In particular, the one I really enjoyed reading was the comparison of the approaches needed to solve the multi armed bandit problem. Need to play around with his code someday

**References:**

One of my friends recently asked me about the K-means algorithm.

- how does it work ?
- what are the typical applications i.e. where/how is it used in the industry ?

In the discussion that followed, we ended up playing around with several visualizations available that do an awesome job of explaining this technique.

we also hacked around with some code from joel grus’ book (data science from scratch) to develop more intuition on the K-means algorithm.

**Visualizations:**

- https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
- http://stanford.edu/class/ee103/visualizations/kmeans/kmeans.html
- http://tech.nitoyon.com/en/blog/2013/11/07/k-means/

**Code:**

There are some very interesting insights which we got playing around with trying to use K-means to cluster an image containing different colors:

- Awesome – Most Cited Deep Learning Papers by Terry Um
- Revo R 3-part series on Deep Learning by Anusha Trivedi
- Machine Learning is Fun! Part 3: Deep Learning and Convolutional Neural Networks – Medium
- http://neuralnetworksanddeeplearning.com/
- Online book by Michael Nielsen

- Brandon Rohrer
- http://www.deeplearningweekly.com/
- Nice weekly digest of developments in Deep Learning

*Deep Learning: Methods and Applications*, L. Deng and D. Yu. Now Publishers, 2014.*Deep Learning*, Goodfellow, Bengio, Courville. MIT Press, 2016- http://www.deeplearningbook.org
- Good reviews from a few folks