Hyperbolic vs sigmoid - How are these different?

 Two common activation functions used in deep learning are the hyperbolic tangent function and the sigmoid activation function. I understand that the hyperbolic tangent is just a rescaling and translation of the sigmoid function:

tanh(z)=2σ(z)−1.

Is there a significant difference between these two activation functions, and in particular, when is one preferable to the other? I realise that in some cases (like when estimating probabilities) outputs in the range of [0,1] are more convenient than outputs that range from [−1,1]. I want to know if there are differences other than convenience which distinguish the two activation functions.

Answered by Amit Sinha

Hyperbolic vs Sigmoid


Sigmoid > Hyperbolic tangent: As you mentioned, the application of Sigmoid might be more convenient than hyperbolic tangent in the cases that we need a probability value at the output (as @matthew-graves says, we can fix this with a simple mapping/calibration step). In other layers, this makes no sense.

Hyperbolic tangent > Sigmoid: Hyperbolic tangent has a property called "approximates identity near the origin" which means tanh(0)=0, tanh′(0)=1, and tanh′(z) is continuous around z=0 (as opposed to σ(0)=0.5 and σ′(0)=0.25). This feature (which also exists in many other activation functions such as identity, arctan, and sinusoid) lets the network learn efficiently even when its weights are initialised with small values. In other cases (e.g. Sigmoid and ReLU) these small initial values can be problematic. Further Reading: Random Walk Initialization for Training Very Deep Feedforward Networks



Your Answer

Interviews

Parent Categories