A user has a dataset of reviews which has a class label of positive/negative. He is applying Decision Tree to that reviews dataset. How to get the feature importance using the decision tree classifier?

422    Asked by AishwaryaJhadav in Data Science , Asked on Nov 17, 2019
Answered by Aishwarya Jhadav

Below is the code

X_tr, X_test, y_tr, y_test = cross_validation.train_test_split(sorted_data['Text'], labels, test_size=0.3, random_state=0)

# BOW

count_vect = CountVectorizer()

count_vect.fit(X_tr.values)

final_counts = count_vect.transfrom(X_tr.values)

# instantiate learning model k = optimal_k

# Applying the vectors of train data on the test data

optimal_lambda = 15

final_counts_x_test = count_vect.transform(X_test.values)

bow_reg_optimal = DecisionTreeClassifier(max_depth=optimal_lambda,random_state=0)

# fitting the model

bow_reg_optimal.fit(final_counts, y_tr)

# predict the response

pred = bow_reg_optimal.predict(final_counts_x_test)

# evaluate accuracy

acc = accuracy_score(y_test, pred) * 100

print('
The accuracy of the Decision Tree for depth = %f is %f%%' % (optimal_lambda, acc))

To get the important features, we can use the built in function feature_importances_ attribute, which will be defined once fit() is called. Below is the implementation of the code

import numpy as np

X = np.random.rand(1000,2)

y = np.random.randint(0, 5, 1000)

from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier().fit(X, y)

tree.feature_importances_

# array([ 0.51390759, 0.48609241])

Here the output will be the lists of all dominant features which will be defined by the probability of the importance.



Your Answer

Interviews

Parent Categories