Grab Deal : Flat 30% off on live classes + 2 free self-paced courses - SCHEDULE CALL

Select Course
Blog
Corporate Training

+1 202 599 3842

(4.8/5 ) | 1.5K+ Ratings

- Data Science Blogs -

A Practical guide to implementing Random Forest in R with example

Content Index

Introduction
Random Forest In R
Need for Random Forests
Mechanics of the Algorithm
- Using subsets of predictor variables
Random Forest Case Study In R
Conclusion

Introduction

You must have heard of Random Forest, Random Forest in R or Random Forest in Python! This article is curated to give you a great insight into how to implement Random Forest in R.

We will discuss Random Forest in R example to understand the concept even better--

Random Forest In R

When we are going to buy any elite or costly items like Car, Home or any investment in the share market then we prefer to take multiple people's advice. It is unlikely that we just go to a shop and purchase any item on a random basis. We collect many suggestions from different people we know and then take the best option by seeing the positives and negatives of individuals. The reason for taking is that a review of one person can be biased as per his interests and past experiences however by asking multiple people we are trying to mitigate bias caused by any individual. One person may have a very strong aversion for a specific product because of his experience for that product, on the other hand, several other people may have very strong favor for the same product because they have had a very positive experience there.

This concept is called ‘Ensembling’ in Analytics. Ensembling is a technique in which many models are trained on a training dataset and their outputs are assimilated by some rules to get the final output.”

Decision trees have one serious drawback that they are prone to overfitting. The decision tree is grown very deep then it will learn all possible relationships in data. Overfitting can be mitigated with a technique called Pruning which reduces the size of decision trees by removing parts of the tree that provides less power to correct classification. In spite of pruning, the result often is not up to the mark. The primary reason for this is that the algorithm makes a locally optimal choice at each split without any regard to the choice is best for overall grown tree So a bad choice of split at the starting stage can result in poor model and that cannot be compensated by post-ad-hoc pruning.

Need for Random Forests

Decision trees are very popular because their idea of making decisions reflects how humans make decisions. They check options at different stages of tree split and selecting the best one. The analogy helps to suggest how decision trees can be improved.

One of the TV games provides an option (“Audience poll”) to contestants wherein he can ask the audience to vote on any question if he is clueless. The reason is that the answer from the majority of independent people has more chances of being correct.

People have different experiences and will, therefore, draw upon different “data” to answer the question.
People have different learning curves and preferences and will, therefore, draw upon different “variables” to make their choices at each stage in their decision process.

Based on the above human thinking comparison, it seems reasonable to build many decision trees and selecting random subsets using:

Different subsets of training data
Randomly selecting different subsets of columns for tree splitting

Final Predictions can be drawn by taking the majority vote over all trees, mode of classification in-case of classification problems and median in case of regression problems. This is how the random forest algorithm works.

Data Science Training - Using R and Python

Detailed Coverage
Best-in-class Content
Prepared by Industry leaders
Latest Technology Covered

Download Curriculum

These above two strategies help to reduce overfitting by averaging the response over trees created from different samples of the dataset and decreasing the probability of a small dataset of strong predictors dominating the splits. But everything has a price. Here, model interpretability is reduced with an increase in computational complexity.

Mechanics of the Algorithm

Without going into many mathematical details of the algorithm, let’s understand how the above points are implemented in the algorithm.

The main feature of this algorithm is to use different datasets for building a unique tree. This is achieved by a statistical method called bootstrap aggregating (bagging).

Imagine a dataset of size N. From this dataset we create a sample of size n (n <= N) by selecting n data points randomly with replacement. “Randomly” signifies that every data point in the dataset has an equal probability for selection and “with replacement” means that a particular data point can appear more than once in the subset.

Since the bootstrap aggregated sample is created by sampling with replacement, some data points will not be selected anytime. Generally, on an average each sample will use about two-thirds of the available data points and 1/3rd data points will not be selected in any samples so the model will not be trained on those 1/3rd datapoints. This gives us a way to estimate the model building.

Using subsets of predictor variables

Bootstrap aggregating (bagging) reduces overfitting to a certain extent but it does not eliminate overfitting issues completely. The reason for this is that there are certain input predictors that influence the tree split and they overshadow weak predictors. These predictors play an important role in the early split of the decision tree and eventually, they influence the structure and sizes of trees in the forest. This results in correlations between trees in random forests because the same predictors are deriving split and tree size so we will get the same classification result.

The random forest has a solution to this- that is, for each split, it selects a random set of subset predictors so each split will be different. So more strong predictors cannot overshadow other fields and hence we get more diverse forests.

Read: How Effective is the Graphics in R?

Random Forest Case Study In R

We will proceed as follows to train the Random Forest:

Import the data
Train the model
Tuning Random forest Model
Visualize the model
Evaluate the model
Visualize Result

Data Science Training - Using R and Python

No cost for a Demo Class
Industry Expert as your Trainer
Available as per your schedule
Customer Support Available

Enrol For a Free Demo Class

Set the control parameter

Evaluate the model with the default setting
Find the best number of mtry
Find the best number of maxnodes
Find the best number of ntrees
Evaluate the model on the test dataset

Before you begin the exploration of the parameter, you need to install two libraries:-

Caret: library in R for machine learning
e1071: R machine learning library

Data Science Training - Using R and Python

Personalized Free Consultation
Access to Our Learning Management System
Access to Our Course Curriculum
Be a Part of Our Free Demo Class

Evaluate Model with Default Setting

trainControl() function controls the folder cross-validation. You can try to run the model with the default parameters and see the accuracy score.

The basic syntax is:-


train(formula, df, method = "rf", metric= "Accuracy", trControl = trainControl(), tuneGrid = NULL)
argument
- ‘formula’: Define the formula of the algorithm
- ‘method’: Define which model to train. Note, at the end of the tutorial, there is a list of all the models that can be trained
- ‘metric’ = "Accuracy": Define how to select the optimal model
- ‘trControl = trainControl()’: Define the control parameters
- ‘tuneGrid = NULL’: Return a data frame with all the possible combinations.

You will use the caret library to evaluate your model. The library has one function called train() to evaluate almost all machine learning algorithms. Say differently, you can use this function to train other algorithms.


set.seed(1234)
# Run the model
rf_default <- train(survived~.,
data = data_train,
method = "rf",
metric = "Accuracy",
trControl = trControl)
# Print the results
print(rf_default)

Code Explanation

train Control (method="cv", number=10, search="grid"): Evaluate the model with a grid search of 10 folder
train(...): Train a random forest model.

Output:

The algorithm uses 500 trees and tested three different values of mtry: 2, 6, 10.The final value used for the model was mtry = 2 with an accuracy of 0.78. Let's try to get a higher score.

Step 2) Finding best mtry

Let’s test the model with values of mtry from 1 to 10


set.seed(1234)
tuneGrid <- expand.grid(.mtry = c(1: 10))
rf_mtry <- train(survived~.,
data = data_train,
method = "rf",
metric = "Accuracy",
tuneGrid = tuneGrid,
trControl = trControl,
importance = TRUE,
nodesize = 14,
ntree = 300)
print(rf_mtry)

Code Explanation: tuneGrid <- expand.grid(.mtry=c(3:10)): Construct a vector with value from 3:10

The final value used for the model was mtry = 4.

Output:


## Random Forest
## 836 samples
##   7 predictor
##   2 classes: 'No', 'Yes'
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 753, 752, 753, 752, 752, 752, ...
## Resampling results across tuning parameters:
##   mtry  Accuracy   Kappa
##        1          0.7572576  0.4647368
##        2          0.7979346  0.5662364
##    3   0.8075158  0.5884815
##        4          0.8110729  0.5970664
##        5          0.8074727  0.5900030
##        6          0.8099111  0.5949342
##        7          0.8050918  0.5866415
##        8          0.8050918  0.5855399
##        9          0.8050631  0.5855035
##   10  0.7978916  0.5707336
##Final model was built using  mtry = 4.

The best value of mtry is stored in:

rf_mtry$bestTune$mtry

You can store it and use it when you need to tune the other parameters.

max(rf_mtry$results$Accuracy)

Output:


## [1] 0.8110729
best_mtry <- rf_mtry$bestTune$mtry
best_mtry

Output:


## [1] 4

Step 3) Search the best maxnodes

Let’s do a different iteration of loops to evaluate the different values of maxnodes. Below we will -

Create a list
Create a variable with the best value of the parameter mtry.
Create the loop
Storing value of maxnode
Summarize the results


store_maxnode <- list()
tuneGrid <- expand.grid(.mtry = best_mtry)
for (maxnodes in c(5: 15)) {
set.seed(1234)
rf_maxnode <- train(survived~.,
data = data_train,
method = "rf",
metric = "Accuracy",
tuneGrid = tuneGrid,
trControl = trControl,
importance = TRUE,
nodesize = 14,
maxnodes = maxnodes,
ntree = 300)
current_iteration <- toString(maxnodes)
store_maxnode[[current_iteration]] <- rf_maxnode
}
results_mtry <- resamples(store_maxnode)
summary(results_mtry)

Output:


## Call:
## summary.resamples(object = results_mtry)
## Models: 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
## Number of resamples: 10
## Accuracy
##        Min.   1st Qu.   Median            Mean   3rd Qu.            Max. NA's
## 5  0.6785714 0.7529762 0.7903758 0.7799771 0.8168388 0.8433735    0
## 6  0.6904762 0.7648810 0.7784710 0.7811962 0.8125000 0.8313253    0
## 7  0.6904762 0.7619048 0.7738095 0.7788009 0.8102410 0.8333333    0
## 8  0.6904762 0.7627295 0.7844234 0.7847820 0.8184524 0.8433735    0
## 9  0.7261905 0.7747418 0.8083764 0.7955250 0.8258749 0.8333333    0
## 10 0.6904762 0.7837780 0.7904475 0.7895869 0.8214286 0.8433735   0
## 11 0.7023810 0.7791523 0.8024240 0.7943775 0.8184524 0.8433735   0
## 12 0.7380952 0.7910929 0.8144005 0.8051205 0.8288511 0.8452381   0
## 13 0.7142857 0.8005952 0.8192771 0.8075158 0.8403614 0.8452381   0
## 14 0.7380952 0.7941050 0.8203528 0.8098967 0.8403614 0.8452381   0
## 15 0.7142857 0.8000215 0.8203528 0.8075301 0.8378873 0.8554217   0
##
## Kappa
##        Min.   1st Qu.   Median            Mean   3rd Qu.            Max. NA's
## 5  0.3297872 0.4640436 0.5459706 0.5270773 0.6068751 0.6717371    0
## 6  0.3576471 0.4981484 0.5248805 0.5366310 0.6031287 0.6480921    0
## 7  0.3576471 0.4927448 0.5192771 0.5297159 0.5996437 0.6508314    0
## 8  0.3576471 0.4848320 0.5408159 0.5427127 0.6200253 0.6717371    0
## 9  0.4236277 0.5074421 0.5859472 0.5601687 0.6228626 0.6480921    0
## 10 0.3576471 0.5255698 0.5527057 0.5497490 0.6204819 0.6717371   0
## 11 0.3794326 0.5235007 0.5783191 0.5600467 0.6126720 0.6717371   0
## 12 0.4460432 0.5480930 0.5999072 0.5808134 0.6296780 0.6717371   0
## 13 0.4014252 0.5725752 0.6087279 0.5875305 0.6576219 0.6678832   0
## 14 0.4460432 0.5585005 0.6117973 0.5911995 0.6590982 0.6717371   0
## 15 0.4014252 0.5689401 0.6117973 0.5867010 0.6507194 0.6955990   0

The last value of maxnode has the highest accuracy. You can try with higher values to see if you can get a higher score.


store_maxnode <- list()
tuneGrid <- expand.grid(.mtry = best_mtry)
for (maxnodes in c(20: 30)) {
set.seed(1234)
rf_maxnode <- train(survived~.,
data = data_train,
method = "rf",
metric = "Accuracy",
tuneGrid = tuneGrid,
trControl = trControl,
importance = TRUE,
nodesize = 14,
maxnodes = maxnodes,
ntree = 300)
key <- toString(maxnodes)
store_maxnode[[key]] <- rf_maxnode
}
results_node <- resamples(store_maxnode)
summary(results_node)

Output:


##
## Call:
## summary.resamples(object = results_node)
##
## Models: 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30
## Number of resamples: 10
##
## Accuracy
##        Min.   1st Qu.   Median            Mean   3rd Qu.            Max. NA's
## 20 0.7142857 0.7821644 0.8144005 0.8075301 0.8447719 0.8571429   0
## 21 0.7142857 0.8000215 0.8144005 0.8075014 0.8403614 0.8571429   0
## 22 0.7023810 0.7941050 0.8263769 0.8099254 0.8328313 0.8690476   0
## 23 0.7023810 0.7941050 0.8263769 0.8111302 0.8447719 0.8571429   0
## 24 0.7142857 0.7946429 0.8313253 0.8135112 0.8417599 0.8690476   0
## 25 0.7142857 0.7916667 0.8313253 0.8099398 0.8408635 0.8690476   0
## 26 0.7142857 0.7941050 0.8203528 0.8123207 0.8528758 0.8571429   0
## 27 0.7023810 0.8060456 0.8313253 0.8135112 0.8333333 0.8690476   0
## 28 0.7261905 0.7941050 0.8203528 0.8111015 0.8328313 0.8690476   0
## 29 0.7142857 0.7910929 0.8313253 0.8087063 0.8333333 0.8571429   0
## 30 0.6785714 0.7910929 0.8263769 0.8063253 0.8403614 0.8690476   0
##
## Kappa
##        Min.   1st Qu.   Median            Mean   3rd Qu.            Max. NA's
## 20 0.3956835 0.5316120 0.5961830 0.5854366 0.6661120 0.6955990   0
## 21 0.3956835 0.5699332 0.5960343 0.5853247 0.6590982 0.6919315   0
## 22 0.3735084 0.5560661 0.6221836 0.5914492 0.6422128 0.7189781   0
## 23 0.3735084 0.5594228 0.6228827 0.5939786 0.6657372 0.6955990   0
## 24 0.3956835 0.5600352 0.6337821 0.5992188 0.6604703 0.7189781   0
## 25 0.3956835 0.5530760 0.6354875 0.5912239 0.6554912 0.7189781   0
## 26 0.3956835 0.5589331 0.6136074 0.5969142 0.6822128 0.6955990   0
## 27 0.3735084 0.5852459 0.6368425 0.5998148 0.6426088 0.7189781   0
## 28 0.4290780 0.5589331 0.6154905 0.5946859 0.6356141 0.7189781   0
## 29 0.4070588 0.5534173 0.6337821 0.5901173 0.6423101 0.6919315   0
## 30 0.3297872 0.5534173 0.6202632 0.5843432 0.6590982 0.7189781   0

We can see that for max node 22, accuracy is highest.

Step 4) Search the best ntrees

After tuning mtry and max node values, now let's tune the number of trees. The method is for tuning ntree is the same as tuning of max nodes.


store_maxtrees <- list()
for (ntree in c(250, 300, 350, 400, 450, 500, 550, 600, 800, 1000, 2000)) {
set.seed(5678)
rf_maxtrees <- train(survived~.,
data = data_train,
method = "rf",
metric = "Accuracy",
tuneGrid = tuneGrid,
trControl = trControl,
importance = TRUE,
nodesize = 14,
maxnodes = 24,
ntree = ntree)
key <- toString(ntree)
store_maxtrees[[key]] <- rf_maxtrees
}
results_tree <- resamples(store_maxtrees)
summary(results_tree)

Output:


##
## Call:
## summary.resamples(object = results_tree)
##
## Models: 250, 300, 350, 400, 450, 500, 550, 600, 800, 1000, 2000
## Number of resamples: 10
##
## Accuracy
##           Min.   1st Qu. Median      Mean   3rd Qu.         Max. NA's
## 250  0.7380952 0.7976190 0.8083764 0.8087010 0.8292683 0.8674699         0
## 300  0.7500000 0.7886905 0.8024240 0.8027199 0.8203397 0.8452381         0
## 350  0.7500000 0.7886905 0.8024240 0.8027056 0.8277623 0.8452381         0
## 400  0.7500000 0.7886905 0.8083764 0.8051009 0.8292683 0.8452381         0
## 450  0.7500000 0.7886905 0.8024240 0.8039104 0.8292683 0.8452381         0
## 500  0.7619048 0.7886905 0.8024240 0.8062914 0.8292683 0.8571429         0
## 550  0.7619048 0.7886905 0.8083764 0.8099062 0.8323171 0.8571429         0
## 600  0.7619048 0.7886905 0.8083764 0.8099205 0.8323171 0.8674699         0
## 800  0.7619048 0.7976190 0.8083764 0.8110820 0.8292683 0.8674699         0
## 1000 0.7619048 0.7976190 0.8121510 0.8086723 0.8303571 0.8452381        0
## 2000 0.7619048 0.7886905 0.8121510 0.8086723 0.8333333 0.8452381        0
##
## Kappa
##           Min.   1st Qu. Median      Mean   3rd Qu.         Max. NA's
## 250  0.4061697 0.5667400 0.5836013 0.5856103 0.6335363 0.7196807         0
## 300  0.4302326 0.5449376 0.5780349 0.5723307 0.6130767 0.6710843         0
## 350  0.4302326 0.5449376 0.5780349 0.5723185 0.6291592 0.6710843         0
## 400  0.4302326 0.5482030 0.5836013 0.5774782 0.6335363 0.6710843         0
## 450  0.4302326 0.5449376 0.5780349 0.5750587 0.6335363 0.6710843         0
## 500  0.4601542 0.5449376 0.5780349 0.5804340 0.6335363 0.6949153         0
## 550  0.4601542 0.5482030 0.5857118 0.5884507 0.6396872 0.6949153         0
## 600  0.4601542 0.5482030 0.5857118 0.5884374 0.6396872 0.7196807         0
## 800  0.4601542 0.5667400 0.5836013 0.5910088 0.6335363 0.7196807         0
## 1000 0.4601542 0.5667400 0.5961590 0.5857446 0.6343666 0.6678832        0
## 2000 0.4601542 0.5482030 0.5961590 0.5862151 0.6440678 0.6656337        0

We have tuned all important parameters. Now we can train the random forest with the following parameters:

ntree =800:
mtry=4:
maxnodes = 24: Maximum 24 nodes in the terminal nodes (leaves)


fit_rf <- train(survived~.,
data_train,
method = "rf",
metric = "Accuracy",
tuneGrid = tuneGrid,
trControl = trControl,
importance = TRUE,
nodesize = 14,
ntree = 800,
maxnodes = 24)

Step 5) Model Evaluation: caret library in R has a function to make predictions.


predict(model, newdata= df)
argument
- `model`: Define the model evaluated before.
- `newdata`: Define the dataset to make prediction
prediction <-predict(fit_rf, data_test)

You can use the prediction to compute the confusion matrix and see the accuracy score
confusionMatrix(prediction, data_test$survived)

Output:


## Confusion Matrix and Statistics
##
##       Reference
## Prediction  No Yes
##        No  110  32
##        Yes  11  56
##
##                    Accuracy : 0.7943
##                    95% CI : (0.733, 0.8469)
##        No Information Rate : 0.5789
##        P-Value [Acc > NIR] : 3.959e-11
##
##                    Kappa : 0.5638
##  Mcnemar's Test P-Value : 0.002289
##
##                    Sensitivity : 0.9091
##                    Specificity : 0.6364
##        Pos Pred Value : 0.7746
##        Neg Pred Value : 0.8358
##                    Prevalence : 0.5789
##        Detection Rate : 0.5263
##        Detection Prevalence : 0.6794
##        Balanced Accuracy : 0.7727
##
##        'Positive' Class : No
##

We have got an accuracy of 0.7943 percent, which is much higher than the default accuracy.

Step 6) Visualize Result

Now let’s find feature importance with the function varImp(). In the variable importance plot, it seems that the most relevant features are sex and age. The more important features tend to appear near the root of the tree, on the other hand, less important features will often appear close to the leaves.


varImpPlot(fit_rf)
varImp(fit_rf)
## rf variable importance
##
##                    Importance
## sexmale     100.000
## age             28.014
## pclassMiddle          27.016
## fare             21.557
## pclassUpper           16.324
## sibsp           11.246
## parch              5.522
## embarkedC            4.908
## embarkedQ            1.420
## embarkedS             0.000

Import the Data

We will use the Titanic dataset for our case study in the Random forest model. You can directly import a dataset from the internet.

Read: The Battle Between R and Python

Train the model

The random forest has some parameters that can be changed to improve the generalization of the prediction. You will use the function RandomForest() to train the model. We need to install a RandomForest library or package to use this method.

A random forest model can be built using all predictors and the target variable as the categorical outcome. Random forest was attempted with the train function from the caret package and also with the randomForest function from the randomForest package.

Tuning RF Model

The tuning parameter for a model is very cumbersome work. There can be many permutations and combinations for a set of hyperparameters. Trying all combinations can be a very time and memory consuming task. A better approach can be that the algorithm decides the best set of parameters. There are two common methods for tuning.

Grid Search
Random Search

Random Search

Random search does not evaluate all the combinations of hyperparameters . Instead, it will randomly select any combination at every iteration. The advantage is it’s lower the computational cost, memory cost and less time required.

Grid Search

In this tutorial, we will cover both methods, we will train the model using a grid search. Grid search is simple and the model is trained for all combinations we give in the parameters list.

If the number of trees is 10 , 20, 30 and the number of mtry(no. of candidates drawn to feed algorithm) equals 1, 2, 3, 4, 5. Then total models will be created.

The drawback of the grid search is the high amount of time and experiments carried out. To overcome this issue we can use random search.

Conclusion

So now, whenever anyone talks about Random forest in R, Random forest in Python or just random forest, you will have the basic idea of it. Implementing Random forest in Python is similar to how it was implemented in R.

Machine learning algorithms like the random forest, Neural networks are known for better accuracy and high performance, but the problem is that they are a black box. No-one knows how they work internally. So, results interpretation is a big issue and challenge. It's fine to not know the internal statistical details of the algorithm but how to tune random forest is of utmost importance. Tuning the Random forest algorithm is still relatively easy compared to other algorithms.

In spite of being a black-box random forest is a highly popular ensembling technique for better accuracy. It’s even called Panacea in Machine Learning Algorithms. It is said that if you are confused about deciding which algorithm to use for classification then you can use a random forest with closing eyes. Go to Janbask Training to get a better understanding of Random Forest.

Data Science Tutorial Overview

Introduction

Careers

Data Science Vs. Different Technologies

Tools

Useful Resources

Interview

FaceBook

Twitter

JanBask Training Team

The JanBask Training Team includes certified professionals and expert writers dedicated to helping learners navigate their career journeys in QA, Cybersecurity, Salesforce, and more. Each article is carefully researched and reviewed to ensure quality and relevance.

Comments

Data Science Course
Upcoming Batches

Aug

Mon - Fri

6 Weeks

Aug

Mon - Fri

6 Weeks

Aug

Mon - Fri

6 Weeks

Aug

Mon - Fri

6 Weeks

View Detail

Trending Courses

Cyber Security

Introduction to cybersecurity
Cryptography and Secure Communication
Cloud Computing Architectural Framework
Security Architectures and Models

Upcoming Class

6 days 25 Jul 2025

View Details

Introduction and Software Testing
Software Test Life Cycle
Automation Testing and API Testing
Selenium framework development using Testing

Upcoming Class

-1 day 18 Jul 2025

View Details

Salesforce

Salesforce Configuration Introduction
Security & Automation Process
Sales & Service Cloud
Apex Programming, SOQL & SOSL

Upcoming Class

4 days 23 Jul 2025

View Details

Business Analyst

BA & Stakeholders Overview
BPMN, Requirement Elicitation
BA Tools & Design Documents
Enterprise Analysis, Agile & Scrum

Upcoming Class

6 days 25 Jul 2025

View Details

MS SQL Server

Introduction & Database Query
Programming, Indexes & System Functions
SSIS Package Development Procedures
SSRS Report Design

Upcoming Class

6 days 25 Jul 2025

View Details

Data Science

Data Science Introduction
Hadoop and Spark Overview
Python & Intro to R Programming
Machine Learning

Upcoming Class

-1 day 18 Jul 2025

View Details

DevOps

Intro to DevOps
GIT and Maven
Jenkins & Ansible
Docker and Cloud Computing

Upcoming Class

0 day 19 Jul 2025

View Details

Hadoop

Architecture, HDFS & MapReduce
Unix Shell & Apache Pig Installation
HIVE Installation & User-Defined Functions
SQOOP & Hbase Installation

Upcoming Class

-1 day 18 Jul 2025

View Details

Python

Features of Python
Python Editors and IDEs
Data types and Variables
Python File Operation

Upcoming Class

6 days 25 Jul 2025

View Details

Artificial Intelligence

Components of AI
Categories of Machine Learning
Recurrent Neural Networks
Recurrent Neural Networks

Upcoming Class

-1 day 18 Jul 2025

View Details

Machine Learning

Introduction to Machine Learning & Python
Machine Learning: Supervised Learning
Machine Learning: Unsupervised Learning

Upcoming Class

6 days 25 Jul 2025

View Details

Tableau

Introduction to Tableau Desktop
Data Transformation Methods
Configuring tableau server
Integration with R & Hadoop

Upcoming Class

-1 day 18 Jul 2025

View Details

Browse Categories

The Battle Between R and Python

Dec 22, 2019 eye-dark

4.2k

How Online Training is Better Than In-Person Training?

Jan 04, 2022 eye-dark

162.8k

How to work with Deep Learning on TensorFlow?

Apr 22, 2020 eye-dark

Search Posts

Reset

The Battle Between R and Python 4.2k

How Online Training is Better Than In-Person Training? 162.8k

How to work with Deep Learning on TensorFlow? 4k

Unlock the Advanced Power of Augmented Analytics in Data Science: A Transformative Fusion 1.8k

How to Become a Successful Data Scientist? 347k

Data Science Course
Upcoming Batches

Aug

Mon - Fri

6 Weeks

Aug

Mon - Fri

6 Weeks

Aug

Mon - Fri

6 Weeks

Aug

Mon - Fri

6 Weeks

View Detail

Receive Latest Materials and Offers on Data Science Course

By submitting my contact details, I agree Privacy Policy ... and I consent to receiving SMS/call/email, including marketing and promotional SMS. Read More

Scroll

A Practical guide to implementing Random Forest in R with example

Content Index

Introduction

Random Forest In R

Need for Random Forests

Mechanics of the Algorithm

Using subsets of predictor variables

Random Forest Case Study In R

Set the control parameter

Evaluate Model with Default Setting

Step 3) Search the best maxnodes

Step 4) Search the best ntrees

Import the Data

Train the model

Tuning RF Model

Random Search

Grid Search

Conclusion

Data Science Tutorial Overview

Introduction

Careers

Data Science Vs. Different Technologies

Tools

Useful Resources

Interview

JanBask Training Team

Comments

Trending Courses

Browse Categories

Related Posts