A user tries to run logistic regression on my data (6 categorical, 1 integer) using scikit learn. He is following the scikit learn documentation but when trying to fit my data he is getting the following value error.

891 Asked by SnehaPandey in Data Science , Asked on Nov 30, 2019

Answered by Sneha Pandey

#Below are the variables of my data.

train_data.dtypes

OUTPUT

TripType category

VisitNumber category

Weekday category

Upc category

ScanCount int64

DepartmentDescription category

FinelineNumber category

dtype: object

X = train_data.loc[:, 'VisitNumber':'FinelineNumber']

Y = train_data.loc[:, 'TripType':'TripType']

logreg = linear_model.LogisticRegression()

logreg.fit(X, Y)

**ValueError: could not convert string to float: GROCERY DRY GOODS**

The error is due to the presence of categorical variables in the dataset. We cannot use names of categories directly as features in logistic regression. We need to convert them into some encoded vectors (or dummy variables). If we have 6 categories we need to use 5 dummy variables.

The example of changing variable into dummies is given below

The gender column has been changed to dummy variables 0 and 1.

A user tries to run logistic regression on my data (6 categorical, 1 integer) using scikit learn. He is following the scikit learn documentation but when trying to fit my data he is getting the following value error.

Your Answer