Independence Day Offer - FLAT 30% OFF | JTINFLAT30

- Data Science Blogs -

An Easy Way to Understand Adaboost

Introduction

In the 21st century, machines are learning. Few machines learn the concept to the totality few are just weak learners just the students of a class. Few learn the subject, few just fail in the exam. The same is the case with machine learning algorithms, some of them are weak learners. To improve the learning in weak learners, a technique named boosting is implied. This boosting coupled with a set of weak learners give this algorithm its name Adaboost or in full known as “Adaptive Boosting”. To elaborate on this concept of “Adaptive Boosting’ the blog is divided into the following sections:

What are weak learners?

The weak learner is a type of learner who will only outperform a chance in any scenario where prediction is made. The accuracy of prediction is independent of the type of underlying distribution. These types of classifiers have a chance greater than ½ in the case of binary classification. These types of learners are going to learn something but will not be able to perform as per the requirement. These are though a type of classifier which has their prediction capacity which is slightly correlated with the true classification. 

One of the classical examples of weak learners is decision stump (a one-level decision tree). Owing to its hierarchical design and rules associated with decision making it can perform well in certain cases i.e. slightly better than chance. At the same time, it is unjustified to call a support vector machine a weak learner.

What is boosting:

In the domain of machine learning, boosting is a type of ensemble-based meta-algorithm that primarily reduces the bias and variance for training done under the domain of supervised learning. These are a family of machine learning algorithms that convert the weak learners into strong ones. 

Boosting was first introduced as a solution to hypothesis boosting problem which in simpler terms is just converting a weak learner to a strong learner. Boosting is achieved by building a model with a certain amount of error from the dataset and then creating another one that can rectify the error. This process is done until the training data can be modeled using all the models.

The formal definition of Adaboost:

AdaBoost stands for “adaptive Boosting’ and is the 1st boosting algorithm which was designed by Freund and Schapire in 1996. It is primarily focused upon the classification problems and is designed to convert a group of weak learners into a unified strong learner. The learner is represented mathematically as: 

Adaboost

Where fm stands for mth weak classifier
           

Learning in Adaboost:

Adaptive boosting refers to a specific method of training a boosting based classifier. A boosting based ensemble classifier is of the form:

Where ft happens to be a weak learner that digests an input vector x and generated a class prediction.

Now, each weak learner-generated produces an output hypothesis, H(xi), for every sample, supplied in the training set. At every iteration t, a new weak learner is carefully chosen from all those generated in that step and is assigned a coefficient which satisfies the condition that the sum of training error Etof the final classifier is minimized i.e.

Read: How Satistical Inference Like Terms Helps In Analysis?

Where Ft-1(x)  is the classifier selected in the previous step,

           And    is the weak learner which is under consideration for induction in this step.

Steps involved in generating an AdaBoost based classifier:

The following steps are involved once we start from the dataset:

  1. First of all, a subset of training is selected randomly.
  2. Machine iteratively trains Adaboost based model by selecting training subsets which give the best accuracy,
  3. Weights assigned to wrongly mapped observations are higher as compared to those who are correctly mapped. Thus, giving a higher probability of being selected for the next step.
  4. This design of training a system also assigns a weight to a classifier as per the accuracy provided by the same.
  5. The process is continued still the stopping condition is reached.
  6. Finally, voting between all the trained classifiers is taken and the final model is built.

Data Science Training - Using R and Python

  • No cost for a Demo Class
  • Industry Expert as your Trainer
  • Available as per your schedule
  • Customer Support Available

Designing an Adaboost based model using python:

Adaboost stands for adaptive booting and is an ensemble-based boosting model for machine learning. Here, python will be used with sklearn to design an Adaboost based classifier and test its accuracy:

The first step in creating a model is to import the model and related lib:

from sklearn.ensemble import AdaBoostClassifier

Imports the Adaboost model from standards lib

from sklearn.datasets import make_classification

Imports the libraries for creating a random labeled dataset from classification

Read: Salary Structure of Data Scientist in USA

Once the libraries have been loaded into the working memory, a dataset is being created in this example:

input_vector, label = make_classification(n_samples=1000, n_features=7,
                          n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)

The make_classification generates is used to generate the dataset. This command will generate 1000 samples with 7 features i.e. no. of inputs in the input vector and put the input into input_vector and corresponding labels in the label.

Now the dataset is ready, the first thing is to give a name to the AdaBoost based model:

model = AdaBoostClassifier(n_estimators=100, random_state=0)

As can be seen, there will be 100 weak leaner in this ensemble.

Now, the model is to be trained before it can be utilized in any application:

model.fit(input_vector,label)

Now, the model is trained and fit for utilizing in  any application and can be queried as:

Model.predict([1,0,1,1,0,0,1])

The accuracy of the model can be verified by utilizing the command as:

model.score(input_vector, label)

Read: Data Science and Software Engineering - What you should know?

for the model trained in this example, the score remains as 0.635

Note: if the dataset is being generated using a make_classifier, the final result may be different because of different initial conditions and differences in the dataset introduced because of randomization.

Advantages and Disadvantages of AdaBoost:

Adaboost is one of the basic boosting algorithms. Thus, it has its own sets of issues. The major advantages of AdaBoost are:

  1. It is very fast,
  2. It is easy to use,
  3. It is easy to program,
  4. It can be combined with any other machine learning algorithm without the requirement of fine-tuning parameters.
  5. It can be used in problems which are not in the form of binary classification
  6. Adaboost is versatile and can handle text as well as numeric data.

Adaboost also suffers from a few limitations, which are:

  • It is potentially vulnerable to noise due to its own empirical evidence.
  • If weak classifier underperform, they can make the whole model underperform,
  • Adaboost is highly susceptible to outlier. Thus, not useful in scenarios where outliers are expected to happen.

Data Science Training - Using R and Python

  • Detailed Coverage
  • Best-in-class Content
  • Prepared by Industry leaders
  • Latest Technology Covered

Final Words:

In this blog, we have discussed the Adaboost and creating a model based upon AdaBoost. Adaboost is an ensemble-based Boosting technique which is quite useful in scenarios where finding a strong is difficult. It is one of the basic boosting techniques which is still widely used. The major reason for the same being that this model allows us to capture the non-linearity in the data. 

Please leave the query and comments in the comment section.




    Janbask Training

    A dynamic, highly professional, and a global online training course provider committed to propelling the next generation of technology learners with a whole new way of training experience.


Comments

Trending Courses

AWS

  • AWS & Fundamentals of Linux
  • Amazon Simple Storage Service
  • Elastic Compute Cloud
  • Databases Overview & Amazon Route 53

Upcoming Class

1 day 13 Aug 2020

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing

Upcoming Class

8 days 20 Aug 2020

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning

Upcoming Class

5 days 17 Aug 2020

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation

Upcoming Class

3 days 15 Aug 2020

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL

Upcoming Class

2 days 14 Aug 2020

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing

Upcoming Class

5 days 17 Aug 2020

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum

Upcoming Class

2 days 14 Aug 2020

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design

Upcoming Class

2 days 14 Aug 2020

Python

  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation

Upcoming Class

5 days 17 Aug 2020

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks

Upcoming Class

1 day 13 Aug 2020

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning

Upcoming Class

4 days 16 Aug 2020

Tableau

  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop

Upcoming Class

6 days 18 Aug 2020

Search Posts

Reset

Receive Latest Materials and Offers on Data Science Course

Interviews