Git branch cleanups

  • git remote prune origin –dry-run
  • (Local) git branch -d <localbranch>




Tuning Spark Jobs

I recently got into a discussion of how to tune spark jobs.

This led me to learnings related to dynamic allocation and stuff.

Some interesting links:


ML Algos in R

I have been trying to understand how ML algos in R fit together / compare with each other.

# R RevoScaleR MML Comment
1 lm rxLinMod  — Linear models
2 glm rxGlm  — Linear models
3 Glm w/

Binomial family and the logit link function

rxLogit rxLogisticRegression Logistic regression
4 rpart rxDtree  — Decision Trees implementations
5 gbm rxBTrees rxFastTrees Boosted Decision Tree implementations
6  —- rxDForest rxFastForest



# Title Links
1 Fitting Logistic Regression Models
2 Generalized Linear Models
3 rxDTree(): a new type of tree algorithm for big data
4 A first look at rxBTrees
5 A First Look at rxDForest()


Using SSH Keys on Cloud Platforms


  • openssl.exe req -x509 -nodes -days 365 -newkey rsa:2048 -keyout myPrivateKey.key -out myCert.pem
    • We will mostly use the .key file
    • The .pem file is only needed for Classic deployments. Typically we wont use this.


  • Look up use of req :
    • The req command primarily creates and processes certificate requests . Thats why the output of req is a cerificate (myCert.pem)
    • But we are interested in the private key (myPrivateKey.key). Hence we are using the -keyout flag




  • In AWS,  the private key is saved in a .pem file . you just use the .pem file to connect to the instances.
    • Ideally the .pem extension is for certificates, not for keys.
    • This was one of my confusions – because AWS saves the key in the .pem file 



  • Use ssh-agent to store private keys. Makes life much simpler!


Visualization Using D3 (and dependent libraries)

This link gives a nice summary of data visualization libraries using D3:

Interestingly, it mentions mermaid and rickshaw! Two cool libraries I recently came across

Real time: