Tricks in training
In this section, we will talk about a few techniques that can help to train a better network, including how to initialize weights, tips for optimization parameters, and how to reduce overfitting.
Weight initialization
Following techniques are involved in weight initialization:
- All-zero
- Random initialization
- ReLU initialization
- Xavier initialization
All-zero
First, do NOT use all-zero initialization. Given proper data normalization, it is expected that roughly half of the network weights will be positive and half will be negative. However, this does not mean that weights should be initialized in between, which is zero. Assuming all the weights are the same (no matter if they are zero or not), means that the backpropagation would produce the same result for different parts of the network, which cannot help much in learning.
Random initialization
Initialize the network according to a certain distribution, such as a normal distribution or a uniform distribution, with very small weights...