Explain with a case study how to perform text analysis using R.

1.0K Asked by Nainapandey in Data Science , Asked on Jan 10, 2020

To perform text analysis, we will be importing data containing reviews of restaurants.

First we read the data

# Importing the dataset

dataset_original = read.delim('Restaurant_Reviews.tsv', quote = '', stringsAsFactors = FALSE)

Now we will perform text cleaning after installing all the libraries required.

# Cleaning the texts

# install.packages('tm')

# install.packages('SnowballC')

library(tm)

library(SnowballC)

corpus = VCorpus(VectorSource(dataset_original$Review))

corpus = tm_map(corpus, content_transformer(tolower))

corpus = tm_map(corpus, removeNumbers)

corpus = tm_map(corpus, removePunctuation)

corpus = tm_map(corpus, removeWords, stopwords())

corpus = tm_map(corpus, stemDocument)

corpus = tm_map(corpus, stripWhitespace)

Now we will create the bag of words model

# Creating the Bag of Words model

dtm = DocumentTermMatrix(corpus)

dtm = removeSparseTerms(dtm, 0.999)

dataset = as.data.frame(as.matrix(dtm))

dataset$Liked = dataset_original$Liked

Now we will split the model for training and testing

# Splitting the dataset into the Training set and Test set

# install.packages('caTools')

library(caTools)

set.seed(123)

split = sample.split(dataset$Liked, SplitRatio = 0.8)

training_set = subset(dataset, split == TRUE)

test_set = subset(dataset, split == FALSE)

Now we will fit the model using Random Forest

# Fitting Random Forest Classification to the Training set

# install.packages('randomForest')

library(randomForest)

classifier = randomForest(x = training_set[-692],

y = training_set$Liked,

ntree = 10)

After fitting, we will predict the model

# Predicting the Test set results

y_pred = predict(classifier, newdata = test_set[-692])

We then finally evaluate the model using confusion matrix

# Making the Confusion Matrix

cm = table(test_set[, 692], y_pred)

Your Answer