Here are some tips for building powerful CNN models.
Things to try:
- Filter size: Smaller filters (3×3) may be more efficient
- Number of filters: Is 32 filters the right choice. Do more or fewer do better?
- Pooling vs Strided Convolution: Do you use max pooling or just stride convolutions?
- Batch normalization: Try adding spatial batch normalization after convolution layers and vanilla batch normalization after affine layers. Do your networks train faster?
- Network architecture: Can we do better with a deep network? Good architectures to try include:
- [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
- [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
- [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
- Use TensorFlow Scope: Use TensorFlow scope and/or tf.layers to make it easier to write deeper networks. See this tutorial for making how to use
- Use Learning Rate Decay: As the notes point out, decaying the learning rate might help the model converge. Feel free to decay every epoch, when loss doesn’t change over an entire epoch, or any other heuristic you find appropriate. See the Tensorflow documentation for learning rate decay.
- Global Average Pooling: Instead of flattening and then having multiple affine layers, perform convolutions until your image gets small (7×7 or so) and then perform an average pooling operation to get to a 1×1 image picture (1, 1 , Filter#), which is then reshaped into a (Filter#) vector. This is used in Google’s Inception Network (See Table 1 for their architecture).
- Regularization: Add l2 weight regularization, or perhaps use Dropout as in the TensorFlow MNIST tutorial
Tips for training
For each network architecture that you try, you should tune the learning rate and regularization strength. When doing this there are a couple important things to keep in mind:
- If the parameters are working well, you should see improvement within a few hundred iterations
- Remember the coarse-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
- Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.
- You should use the validation set for hyperparameter search, and we’ll save the test set for evaluating your architecture on the best parameters as selected by the validation set.
Going above and beyond
If you are feeling adventurous there are many other features you can implement to try and improve your performance.
- Alternative update steps: SGD+momentum, RMSprop, and Adam, AdaGrad, AdaDelta.
- Alternative activation functions such as leaky ReLU, parametric ReLU, ELU, or MaxOut.
- Model ensembles
- Data augmentation
- New Architectures