How to regularize your Neural Network


“Regularization is a process that changes the result answer to be “simpler”. It is often used to obtain results for ill-posed problems or to prevent overfitting.” It is used in machine learning in order to obtain a better accuracy or avoid overfitting and speed up the training process of a Neural Network. It is based on the weights and biases of the NNs.

We can also use regularization for an early stopping of the training process if the loss doesn’t decrease anymore, preventing unnecessary steps. The main purpose of those techniques is to reduce overfitting.

L1 Regularization

When having a high number of features, the L1 Regularization technique is the most advised as it’s purpose is to improve the accuracy of the Neural Network and can reduce overfitting when their is sparsity during the training, the L1 will reduce the coefficient close to zero, to zero.

But when we encounter multicollinearity, the L2 Regularization will perform better than the L1. It really depends of the situation and what you need. Here is the formula :

And you can call it with tensorflow using the following function :


L2 Regularization

Like we’ve seen above, the L2 Regularization can deals with the multicollinearity when the indepent variables are highly correlated. It is also used to reduce overfitting, in some case that the L1 regularization can’t manage correctly.

But when dealing with a high number of features, the L1 regularization is still the best technique to choose. L2 Regularization is used to estimate significance of the predictors and will doesn’t take in account the insignificant predictors, thus avoiding overfitting. Here is the formula :

And you can call it with tensorflow using the following function :



Another techniques that can be used along the L1 and L2 regularization, is the Dropout. It is implemented per-layer in the Neural Network and can be used in various types of layers (dense, convolutional, recurrent). It can be implemented on any or all hidden layers as well as the input layer but not on the output layer.

It’s purpose is the same as the above, avoid overfitting. It will use the probabilities laws introducing a new hyperparameter that specifies the probability at which outputs the layer it will dropout and the probability at which ouputs of the layer it will retains. As the weights of the network will be larger because of dropout, the weights will be rescaled.

Bernoulli Approach of the Dropout
Comparison Standard vs Dropout

And you can call it with tensorflow using the following function :


Early Stopping

Another form of regularization used to avoid overfitting is the Early stopping, it used to “trains just enough” the Neural Network. It will stop it before the training steps starts to overfits the training data. The approach here is to use a trigger that will be used to stop the training.

For example, when the loss stops decreasing, we will stop the training thus a neural network in a training phase given 100 epochs could stop at the 25rd epochs as it’s loss no longer decreases (or start to increase). Be aware as the threshold (patience variable in the code below) should be meticulously determines as stopping to early is not a good option either.

We can use it using the Keras API as a callback that can be given to the model training, such as :

callbacks = []
early_stopping = tf.keras.callbacks.EarlyStopping(monitor, patience)

An be used like this in the training of your Neural Network :, y, batch_size, epochs, shuffle, validation_data, callbacks)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store