How can I estimate the generalization error?

How would you estimate the generalization error? What are the methods of achieving this?

Answered by Angela Baker

Generalization error is the error obtained by applying a model to data it has not seen before. So, if you want to measure generalization error, you need to remove a subset from your data and don't train your model on it. After training, you verify your model accuracy (or other performance measures) on the subset you have removed since your model hasn't seen it before. Hence, this subset is called a test set. Additionally, another subset can also be used for parameter selection, which we call a validation set. We can't use the training set for parameter tuning, since it does not measure generalization error, but we can't use the test set too since our parameter tuning would overfit test data. That's why we need a third subset.

Finally, in order to obtain more predictive performance measures, we can use many different train/test partitions and average the results. This is called cross-validation.



Your Answer

Interviews

Parent Categories