What is ground truth?

In the context of Machine Learning, I have seen the term Ground Truth used a lot. I have searched a lot and found the following definition in Wikipedia:

In machine learning, the term "ground truth" refers to the accuracy of the training set's classification for supervised learning techniques. This is used in statistical models to prove or disprove research hypotheses. The term "ground truthing" refers to the process of gathering the proper objective (provable) data for this test. Compared with the gold standard.

Bayesian spam filtering is a common example of supervised learning. In this system, the algorithm is manually taught the differences between spam and non-spam. This depends on the ground truth of the messages used to train the algorithm – inaccuracies in the ground truth will correlate to inaccuracies in the resulting spam/non-spam verdicts.

The point is that I really can not get what it means. Is that the label used for each data object or the target function which gives a label to each data object, or maybe something else?

Ground truth: That is the reality you want your model to predict. It may have some noise but you want your model to learn the underlying pattern in data that’s causing this ground truth. Practically, your model will never be able to predict the ground truth as ground truth will also have some noise and no model gives hundred percent accuracy but you want your model to be as close as possible.

Your Answer


Parent Categories