Webinar Alert : Mastering  Manual and Automation Testing! - Reserve Your Free Seat Now

- Data Science Blogs -

An Easy Way to Understand Adaboost

Introduction

In the 21st century, machines are learning. Few machines learn the concept to the totality few are just weak learners just the students of a class. Few learn the subject, few just fail in the exam. The same is the case with machine learning algorithms, some of them are weak learners. To improve the learning in weak learners, a technique named boosting is implied. This boosting coupled with a set of weak learners give this algorithm its name Adaboost or in full known as “Adaptive Boosting”. To elaborate on this concept of “Adaptive Boosting’ the blog is divided into the following sections:

What are weak learners?

The weak learner is a type of learner who will only outperform a chance in any scenario where prediction is made. The accuracy of prediction is independent of the type of underlying distribution. These types of classifiers have a chance greater than ½ in the case of binary classification. These types of learners are going to learn something but will not be able to perform as per the requirement. These are though a type of classifier which has their prediction capacity which is slightly correlated with the true classification. 

One of the classical examples of weak learners is decision stump (a one-level decision tree). Owing to its hierarchical design and rules associated with decision making it can perform well in certain cases i.e. slightly better than chance. At the same time, it is unjustified to call a support vector machine a weak learner.

What is boosting:

In the domain of machine learning, boosting is a type of ensemble-based meta-algorithm that primarily reduces the bias and variance for training done under the domain of supervised learning. These are a family of machine learning algorithms that convert the weak learners into strong ones. 

Boosting was first introduced as a solution to hypothesis boosting problem which in simpler terms is just converting a weak learner to a strong learner. Boosting is achieved by building a model with a certain amount of error from the dataset and then creating another one that can rectify the error. This process is done until the training data can be modeled using all the models.

The formal definition of Adaboost:

AdaBoost stands for “adaptive Boosting’ and is the 1st boosting algorithm which was designed by Freund and Schapire in 1996. It is primarily focused upon the classification problems and is designed to convert a group of weak learners into a unified strong learner. The learner is represented mathematically as: 

Adaboost

Where fm stands for mth weak classifier
           

Learning in Adaboost:

Adaptive boosting refers to a specific method of training a boosting based classifier. A boosting based ensemble classifier is of the form:

Where ft happens to be a weak learner that digests an input vector x and generated a class prediction.

Now, each weak learner-generated produces an output hypothesis, H(xi), for every sample, supplied in the training set. At every iteration t, a new weak learner is carefully chosen from all those generated in that step and is assigned a coefficient which satisfies the condition that the sum of training error Etof the final classifier is minimized i.e.

Read: How Effective is the Graphics in R?

Where Ft-1(x)  is the classifier selected in the previous step,

           And    is the weak learner which is under consideration for induction in this step.

Steps involved in generating an AdaBoost based classifier:

The following steps are involved once we start from the dataset:

  1. First of all, a subset of training is selected randomly.
  2. Machine iteratively trains Adaboost based model by selecting training subsets which give the best accuracy,
  3. Weights assigned to wrongly mapped observations are higher as compared to those who are correctly mapped. Thus, giving a higher probability of being selected for the next step.
  4. This design of training a system also assigns a weight to a classifier as per the accuracy provided by the same.
  5. The process is continued still the stopping condition is reached.
  6. Finally, voting between all the trained classifiers is taken and the final model is built.

Data Science Training - Using R and Python

  • No cost for a Demo Class
  • Industry Expert as your Trainer
  • Available as per your schedule
  • Customer Support Available

Designing an Adaboost based model using python:

Adaboost stands for adaptive booting and is an ensemble-based boosting model for machine learning. Here, python will be used with sklearn to design an Adaboost based classifier and test its accuracy:

The first step in creating a model is to import the model and related lib:

from sklearn.ensemble import AdaBoostClassifier

Imports the Adaboost model from standards lib

from sklearn.datasets import make_classification

Imports the libraries for creating a random labeled dataset from classification

Read: The Battle Between R and Python

Once the libraries have been loaded into the working memory, a dataset is being created in this example:

input_vector, label = make_classification(n_samples=1000, n_features=7,
                          n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)

The make_classification generates is used to generate the dataset. This command will generate 1000 samples with 7 features i.e. no. of inputs in the input vector and put the input into input_vector and corresponding labels in the label.

Now the dataset is ready, the first thing is to give a name to the AdaBoost based model:

model = AdaBoostClassifier(n_estimators=100, random_state=0)

As can be seen, there will be 100 weak leaner in this ensemble.

Now, the model is to be trained before it can be utilized in any application:

model.fit(input_vector,label)

Now, the model is trained and fit for utilizing in  any application and can be queried as:

Model.predict([1,0,1,1,0,0,1])

The accuracy of the model can be verified by utilizing the command as:

model.score(input_vector, label)

Read: How to Become a Successful Data Scientist?

for the model trained in this example, the score remains as 0.635

Note: if the dataset is being generated using a make_classifier, the final result may be different because of different initial conditions and differences in the dataset introduced because of randomization.

Advantages and Disadvantages of AdaBoost:

Adaboost is one of the basic boosting algorithms. Thus, it has its own sets of issues. The major advantages of AdaBoost are:

  1. It is very fast,
  2. It is easy to use,
  3. It is easy to program,
  4. It can be combined with any other machine learning algorithm without the requirement of fine-tuning parameters.
  5. It can be used in problems which are not in the form of binary classification
  6. Adaboost is versatile and can handle text as well as numeric data.

Adaboost also suffers from a few limitations, which are:

  • It is potentially vulnerable to noise due to its own empirical evidence.
  • If weak classifier underperform, they can make the whole model underperform,
  • Adaboost is highly susceptible to outlier. Thus, not useful in scenarios where outliers are expected to happen.

Data Science Training - Using R and Python

  • Detailed Coverage
  • Best-in-class Content
  • Prepared by Industry leaders
  • Latest Technology Covered

Final Words:

In this blog, we have discussed the Adaboost and creating a model based upon AdaBoost. Adaboost is an ensemble-based Boosting technique which is quite useful in scenarios where finding a strong is difficult. It is one of the basic boosting techniques which is still widely used. The major reason for the same being that this model allows us to capture the non-linearity in the data. 

Please leave the query and comments in the comment section.



fbicons FaceBook twitterTwitter lingedinLinkedIn pinterest Pinterest emailEmail

     Logo

    JanBask Training

    A dynamic, highly professional, and a global online training course provider committed to propelling the next generation of technology learners with a whole new way of training experience.


  • fb-15
  • twitter-15
  • linkedin-15

Comments

Trending Courses

Cyber Security Course

Cyber Security

  • Introduction to cybersecurity
  • Cryptography and Secure Communication 
  • Cloud Computing Architectural Framework
  • Security Architectures and Models
Cyber Security Course

Upcoming Class

4 days 29 Sep 2024

QA Course

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing
QA Course

Upcoming Class

2 days 27 Sep 2024

Salesforce Course

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL
Salesforce Course

Upcoming Class

7 days 02 Oct 2024

Business Analyst Course

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum
Business Analyst Course

Upcoming Class

9 days 04 Oct 2024

MS SQL Server Course

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design
MS SQL Server Course

Upcoming Class

9 days 04 Oct 2024

Data Science Course

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning
Data Science Course

Upcoming Class

2 days 27 Sep 2024

DevOps Course

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing
DevOps Course

Upcoming Class

3 days 28 Sep 2024

Hadoop Course

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation
Hadoop Course

Upcoming Class

2 days 27 Sep 2024

Python Course

Python

  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation
Python Course

Upcoming Class

3 days 28 Sep 2024

Artificial Intelligence Course

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks
Artificial Intelligence Course

Upcoming Class

2 days 27 Sep 2024

Machine Learning Course

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning
Machine Learning Course

Upcoming Class

9 days 04 Oct 2024

 Tableau Course

Tableau

  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop
 Tableau Course

Upcoming Class

2 days 27 Sep 2024

Search Posts

Reset

Receive Latest Materials and Offers on Data Science Course

Interviews