You are given a train data set with a lot of columns and rows. How do you reduce the dimension of this data?

Answered by Nitin Solanki

A Data Dimension is a set of data attributes pertaining to something of interest to a business. We have following methods for reducing dimension of train data having lots of rows and columns:

Principal Component Analysis (PCA) would help us here which can explain the maximum variance in the data set.

We can also check the correlation for numerical data and remove the problem of multicollinearity (if exists) and remove some of the columns which may not impact the model.

We can create multiple datasets and execute them batch wise.



Your Answer

Interviews

Parent Categories