How to calculate information gain for each attribute with respect to a class in a document term matrix using sklearn in Python?

839    Asked by IraJoshi in Data Science , Asked on Nov 7, 2019
Answered by Ira Joshi

 By using mutual_info_classif from sklearn in python we can implement information gain. Below is the code

from sklearn.datasets import fetch_20newsgroups

from sklearn.feature_selection import mutual_info_classif

from sklearn.feature_extraction.text import CountVectorizer

categories = ['talk.religion.misc',

              '', '']

newsgroups_train = fetch_20newsgroups(subset='train',


X, Y =,

cv = CountVectorizer(max_df=0.95, min_df=2,



X_vec = cv.fit_transform(X)

res = dict(zip(cv.get_feature_names(),

               mutual_info_classif(X_vec, Y, discrete_features=True)



Your Answer


Parent Categories