Explain with a case study how to implement tree method using R.
We will be exploring the use of tree methods to classify schools as Private or Public based off their features.
library(ISLR)
head(College)
We will split the data for training and testing
library(caTools)
set.seed(101)
sample = sample.split(df$Private, SplitRatio = .70)
train = subset(df, sample == TRUE)
test = subset(df, sample == FALSE)
Now we will fit the tree buiding model such as decisio tree
library(rpart)
tree <- rpart(Private ~.,method='class',data = train)
After fitting, we will predict the data
tree.preds <- predict(tree,test)
Now we will put a threshold value of 0.5 to define the labels as ‘Yes’ or ‘No’.
tree.preds <- as.data.frame(tree.preds)
# Lots of ways to do this
joiner <- function(x){
if (x>=0.5){
return('Yes')
}else{
return("No")
}
}
tree.preds$Private <- sapply(tree.preds$Yes,joiner)
Now we will evaluate the model using confusion matrix
table(tree.preds$Private,test$Private)
We can plot our tree by using rpart library and prp() function
library(rpart.plot)
prp(tree)