Tips for building better CNN models

Here are some tips for building powerful CNN models.

Things to try:

  • Filter size: Smaller filters (3×3) may be more efficient
  • Number of filters: Is 32 filters the right choice. Do more or fewer do better?
  • Pooling vs Strided Convolution: Do you use max pooling or just stride convolutions?
  • Batch normalization: Try adding spatial batch normalization after convolution layers and vanilla batch normalization after affine layers. Do your networks train faster?
  • Network architecture: Can we do better with a deep network? Good architectures to try include:
    • [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    • [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    • [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
  • Use TensorFlow Scope: Use TensorFlow scope and/or tf.layers to make it easier to write deeper networks. See this tutorial for making how to use tf.layers.
  • Use Learning Rate DecayAs the notes point out, decaying the learning rate might help the model converge. Feel free to decay every epoch, when loss doesn’t change over an entire epoch, or any other heuristic you find appropriate. See the Tensorflow documentation for learning rate decay.
  • Global Average Pooling: Instead of flattening and then having multiple affine layers, perform convolutions until your image gets small (7×7 or so) and then perform an average pooling operation to get to a 1×1 image picture (1, 1 , Filter#), which is then reshaped into a (Filter#) vector. This is used in Google’s Inception Network (See Table 1 for their architecture).
  • Regularization: Add l2 weight regularization, or perhaps use Dropout as in the TensorFlow MNIST tutorial

Tips for training

For each network architecture that you try, you should tune the learning rate and regularization strength. When doing this there are a couple important things to keep in mind:

  • If the parameters are working well, you should see improvement within a few hundred iterations
  • Remember the coarse-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
  • Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.
  • You should use the validation set for hyperparameter search, and we’ll save the test set for evaluating your architecture on the best parameters as selected by the validation set.

Going above and beyond

If you are feeling adventurous there are many other features you can implement to try and improve your performance.

  • Alternative update steps: SGD+momentum, RMSprop, and Adam, AdaGrad,  AdaDelta.
  • Alternative activation functions such as leaky ReLU, parametric ReLU, ELU, or MaxOut.
  • Model ensembles
  • Data augmentation
  • New Architectures


Factory Pattern in Python

I recently used the factory Pattern in Python.

It was a little surprising to me comparing typical usage of this pattern in C# v/s Python.

In particular, in C# you’s typically have an Interface defining the methods and then provide an implementation of these methods in the concrete classes.

In Python, there is no interface – In the example above I embed a static  method in the base  to select the appropriate derived class.

Named Pipes. C# Python. .NET Core

I recently ran into issues  using the NamedPipeServerStream API  with .NET Core.

  • In Windows,  NamedPipeServerStream creates a pipe with a defined name in a specific location on the Windows filesystem (\\.pipe\\)
    • In a Python client application,  we were able to open this pipe for communication  using  the   code  snippet  :  open(r’\\.\pipe\\’ + pipe_name, ‘r+b’, 0)


  • However, on Linux,  the behavior for the NamedPipeServerStream API is different.
    • Looking at the source code for .NET Core,  I saw that NamedPipeServer/ClientStream in .NET Core are built on top of Unix domain sockets.
    • So, if we want to communicate with a Python client, we have to use Python’s socket module




du command (Linux)



If i need to look up directory sizes  only till 1 layer, then the following command is useful:


  • the ‘-s’ flag is useful to summarize the contents on directory


$ du -csh *
52K Desktop
4.0K Documents
0 media
4.0K Music
353M notebooks
4.0K Pictures
4.0K Public
4.0K Templates
947M transferLearning
4.0K Videos
1.3G total



C# OutOfProcess Python