A user wants to use principal component analysis to reduce some noise before applying linear regression.He has 1000 samples and 200 features but he receives an error.

Asked in Data Science, Asked on Jan 15, 2020
Answered by Ranjana Admin

import numpy as np

from sklearn.linear_model import LinearRegression

from sklearn.decomposition import PCA

X = np.random.rand(1000,200)

y = np.random.rand(1000,1)


pca = PCA(n_components=8)


PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,

  svd_solver='auto', tol=0.0, whiten=False)

principal_components = pca.components_


The error is given below

ValueError: Found input variables with inconsistent numbers of samples: [8, 1000]

We should simultaneously fit PCA to X and transform it into (1000, 8) array named X_pca. That's what we should use instead of the pca.components_

pca = PCA(n_components=8)

X_pca = pca.fit_transform(X)


