Activation Function in Artificial Neural Network, what is its purpose ?
--
Introduction
Activation Functions are mathematical functions applied at the output of an artificial neuron, it is named after the “Action potential” from biological neural networks. It served as a stimulation threshold equivalent, once it is reached it will result in the neural response. There is different types of Activation Function, serving differently their purpose and their use depends on what you intend to do with your Artificial Neural Network (ANN).
Some of them tends to be quite simple as it can be interpreted like a 0 to 1 (YES/NO) answer, following a Binary Activation in this case. But depending of the weight of your ANN and it’s intricacy, you would prefer other functions like the ReLU, in order to optimize the accuracy of the training of your neural network. Otherwise it will be saturate and ends up losing accuracy, you will maybe have a lesser cost of training but without the expected results.
Activation functions can serves as Step functions in order to move from layer to layer in your ANNs. Here are some examples of activation functions.
Binary Activation Function
The Binary Step function is quite simple, it is a threshold-based activation function, activating the neuron above the threshold or deactivating it below.
It is commonly used in binary classifications but will become obsolete in a multiclass classification. Here is it’s formula :
Linear Activation Function
In the case of the linear activation function, we can deal with multiclass classification. The activation is proportional to the input and will give the value it has received. But their some drawbacks as the linear activation function can’t perform backward propagation as the derivative is a constant, and will always applied to the first layer, limiting the neural network abilities.
Sigmoid Activation Function
The sigmoid function is a non-linear function and takes the output in a range between 0 to 1, following a probabilistic approach. But it’s major drawback is that it can cause a vanishing gradient depending of the size of the inputs, resulting in the shutdown of the training of your ANN.
Tanh Activation Function
The tanh activation function is quite similar to the sigmoid,it is applied in the output range of -1 to 1. In this case, the most positive the output and the closer it is to 1, and vice versa.
ReLU Activation Function
ReLU (Rectified Linear Unit) activatio function is also a non-linear function, it has a derivative function and is able to do backward propagation. The difference with the others is that it doesn’t activate all the neurons at the same time, in this case, if the output is less than 0 the neuron will be deactivated.
Softmax Activation Function
A non-linear function again, the softmax activation function follows a probabilistic view of the input. It will converts a vector of numbers into a vector of probabilities, it is commonly used in multiclass classification and is used to normalize the output.