What is the hinge loss function?

 I came across the hinge loss function for training a neural network model, but I did not know the analytical form for the same.

I can write the mean squared error loss function (which is more often used for regression) as

∑i=1N(yi−yi^)2

where yi is the desired output in the dataset, yi^ is the actual output by the model, and N is the total number of instances in our dataset.

Similarly, what is the (basic) expression for hinge loss function?


Answered by sanjay Pandey

The hinge loss function is the typical loss function used for binary classification (but it can also be extended to multi-class classification) in the context of support vector machines, although it can also be used in the context of neural networks, as described here.


The hinge loss function is defined as follows

ℓ(y)=max(0,1−t⋅y),(1)

where

t={−1,1} is the label (so, if your labels are in the set {0,1}, you will have to first map them to {−1,1})

y is the output of the classifier (e.g. in the context of the linear SVM, y=w⋅x+b, where w and b are the parameter of the hyper-plane)

This means that the loss in equation 1 is always non-negative. If you're familiar with the ReLU, this loss should look familiar to you. In fact, their plots are very similar.

For more details, you probably should start with the related Wikipedia article, then maybe one of the many machine learning books that covers support vector machines, for example, Pattern Recognition and Machine Learning (2006) by Christopher Bishop, chapter 7 (page 325).



Your Answer

Interviews

Parent Categories