rnew icon6Grab Deal : Flat 30% off on live classes + 2 free self-paced courses! - SCHEDULE CALL rnew icon7


What are Bayes’ Theorem and Its Classifications in Data Mining


One type of statistical classifier is the Bayesian classifier. Class membership probabilities, such as the likelihood that a given tuple belongs to a specific class, may be predicted using these models. Below, we explain Bayes' theorem, the foundation of Bayesian classifications.Comparative studies of classification algorithms have shown that the naïve Bayesian classifier, a straightforward application of Bayesian theory, performs just as well as decision tree and specific neural network classifiers. Bayesian classifiers have also shown impressive accuracy and speed when used in massive databases.

An attribute's impact on a class is assumed to be independent of other characteristics in naive Bayesian classifiers. We call this conditional independence from classes or CCI. It's "naive" in the sense that it's designed to reduce the complexity of the computations required. Unlike naive Bayesian classifiers, Bayesian belief networks are a type of graphical model that can accurately depict relationships between groups of characteristics. This classification is a probabilistic algorithm used in data science and machine learning that leverages Bayes' theorem to predict the probability of a particular event or outcome. 

What is Naive Bayes Algorithm?

The naive Bayes algorithm is a probabilistic machine learning technique that uses statistical methods to classify instances into one of several possible classes. It is based on the concept of conditional probability, which measures the likelihood of an event occurring given that another event has already occurred.To understand how Naive Bayes works in practice, let us consider an example where we want to build a spam filter for emails. The goal is to automatically identify whether an incoming email is spam or not spam (i.e., ham).In this scenario, we would typically start by collecting a large dataset of labeled emails, with each email assigned either a "spam" or "not-spam" label. Next, we need to preprocess these emails and extract relevant features from them, such as the presence of specific keywords in the subject line or body text.

Once we have our training data ready, we can begin building our Naive Bayes classifier model. During the training phase, the naive Bayesian classifier calculates prior probabilities for each class label by counting the number of instances belonging to respective classes out of the total number of instances available in the training set. Specifically, it first calculates the prior probability P(C) for each class C (e.g., P(spam), P(not spam)) by dividing the number of instances belonging to that class by the total number of instances in the dataset.

Next, it computes likelihoods P(x|C) for each feature x given each class C using frequency counts from the training set. This involves calculating how often each feature occurs within examples belonging to a particular class and normalizing these counts by dividing them by the total count of all features observed across all examples belonging to that class.

Finally, combining both priors and likelihoods gives posterior probabilities, representing the final predicted class label and the confidence score indicating how confident the model was about its prediction. In other words, P(C|x)=P(x|C)*P(C)/P(x). We use this formula during the prediction phase when a new instance arrives.

During the prediction phase, when a new instance arrives, it extracts similar features as used during training time. It then applies above mentioned formula i.e.P(C|x)=P(x|C)*P(C)/P(x). It computes the posterior probability for every possible output. The output that has the highest probability gets selected as the final answer.

  • For example: Suppose there are two types A, B. If input X comes whose attribute values are [1,y,z], then naive bayes will calculate Posterior(A/X) and Posterior(B/X). If Posterior (A/X)>Posterior(B/X), it means input X belongs more likely toward type A than B.So, the predicted output will be A.

Overall, the NB algorithm assumes independence between attributes(features) which may not always hold true. In spite of this limitation, it performs well on many real-world problems, including text classification, email filtering, sentiment analysis, etc. Join a self learning data science training course to understand Bayes classification methods better. 

Bayes' Theorem

In the 18th century, a nonconformist English priest, Thomas Bayes, pioneered the fields of probability and decision theory; his theorem bears his name. A data tuple, X, is defined. Bayesians would call X "evidence." Measurements on a standard set of n qualities are used to characterize it. To illustrate, suppose that we have a hypothesis H, which states that X, a given data tuple, falls into a particular category C. P(H|X) is the probability that the hypothesis H holds given the "evidence" or observed data tuple X. It is of use in classification issues. In other words, given the description of tuple X's attributes, we want to find the likelihood that X belongs to class C.

The a posteriori probability, or posterior probability, of H given X is denoted by P(H|X). To illustrate, in our universe of data tuples, X is a client who is 35 years old and has an annual salary of $40,000. Let's pretend H is the hypothesis that one of our clients will purchase a computer from us. If we know customer X's age and income, we may calculate the likelihood that customer X will buy a computer, denoted by P(H|X).On the other hand, P(H) is the a priori probability of H. In our hypothetical computer store, this is the chance that any given consumer will make a purchase, independent of their age, wealth, or anything else. While the prior probability, P(H), is unaffected by X,terior possibility, P(H|X), is based on additional data (such as customer information).

The posterior probability of X given H is denoted by P(X|H). Given that we know X is going to buy a computer, the chance that he or she is 35 years old and makes $40,000 is calculated.The starting point is the probability that X will occur in the future, denoted by P(X). In this case, one of our clients is likely 35 years old with a $40,000 annual income.What methodology is used to arrive at these estimates? We will see that it is possible to predict P(H), P(X|H), and P(X) from the available data. The posterior probability P(H|X) may be computed using Bayes' theorem by summing the probabilities of the hypotheses P(H), P(X|H), and P(H|X) (X).

As stated by Bayes' theorem,

                                                    P(H|X) = P(X|H)P(H)P(X) 

Naive Bayes Classification Methods (Method Of Classification)

Probability, according to the Bayesian view, establishes a "degree of belief." Using Bayes' theorem, we may relate our initial confidence level in a hypothesis to our revised level of confidence after considering the evidence. Such as Let's take a coin and flip it over in our minds. Heads or tails are equally likely to result from a coin toss, with a 50% chance of either outcome. Belief can increase, decrease, or stay the same based on the results of repeatedly flipping a coin and observing its consequences.

Considering thesis X and supporting data Y,

Prior (P(X)) is the initial level of confidence in X. In contrast, Posterior (P(X/Y)) is the updated confidence level after considering Y.

The number indicates the backing that Y gives to X, as determined using Data Mining Bayesian Classifiers  Data Mining Bayesian Classifiers.

From the conditional probability, we may deduce Bayes' theorem.

Data Mining Bayesian Classifiers

Bayesian classification formula    

Naive Bayes Algorithm

The Naive Bayes Algorithm is a type of supervised learning algorithm that is used to resolve classification issues and is based on Bayes' theorem.Its primary application is in high-dimensional training datasets for text categorization.The Naive Bayes Classifier is a basic but powerful classification technique that may be used to rapidly construct accurate machine learning models for making predictions.

It makes predictions based on the object's likelihood because it is a probabilistic classifier.

Article classification, sentiment analysis, and spam detection are just a few common uses of the Naive Bayes algorithm.

The name "Naive Bayes" comes from a combination of the terms "naive" and "Bayes," and it refers to an algorithm that is:

The word "naive" is used to describe a point of view that thinks that the presence of one thing has nothing to do with the presence of any other item. For instance, an apple may be recognized as such because it is red, has a round form, and is tasty. As a result, we can tell it's an apple based on any of its characteristics rather than relying on the others.

Its name, Bayes, comes from the fact that it is based on Bayes' Theorem.

cta10 icon

Data Science Training

  • Personalized Free Consultation
  • Access to Our Learning Management System
  • Access to Our Course Curriculum
  • Be a Part of Our Free Demo Class

Working of Naive Bayes Classifier

To understand more about the Naive Bayes Classifier, you may experiment with constructing a model in Scikit-Learn. This Python library is a free and open-source machine-learning tool.

If you want to use an example, you might use the social media ads dataset. If you know the user's age and other characteristics, you may use this issue to guess whether or not she bought the product after clicking the ad. These stages will walk you through the Naive Bayes Classifier's process and help you understand how it works.

Step 1 - Import basic libraries

You can use the below command to import the basic libraries required.

# Importing basic libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Step 2 - Importing the dataset

Using the below code, import the dataset, which is required.

# Importing the dataset
dataset = pd.read_csv(‘Social_Media_Ads.csv’)
X = dataset.iloc[:, [3, 4]]
y = dataset.iloc[:, 5]
print(“Prediction evidence:\n”, X.head())
print(“\nFinal Target:\n”, y.head())

Step 3 - Data preprocessing

# Conversion of variables into arrays
X = X.values
y = y.values
# Dataset splitting into training and test datasets(70:30)
from sklearn.selection_of_model import splitting_of_train_test_dataset
X_train, X_test, y_train, y_test = splitting_of_train_test_dataset(X, y, test_size = 0.30)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.transform_fit(X_train)
X_test = sc.transform(X_test)

In this step; you have to split the dataset into a training dataset (70%) and a testing dataset (30%). Next, you have to do some basic feature scaling with the help of a standard scaler. It will transform the dataset in a way where the mean value will be 0, and the standard deviation will be 1.

Step 4 - Training the model

You should then write the following command for training the model.

# Fitting of Naive Bayes Algorithm to the Training Dataset
from sklearn.naive_bayes_algorithm import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

Step 5 - Testing and evaluation of the model

The code for testing and evaluating the model is as below:

# Prediction of the test dataset outcomes
y_pred = classifier.predict(X_test)
# Constructing the confusion matrix
import seaborn as sns
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True)

A confusion matrix helps to understand the quality of the model. It describes the production of a classification model on a set of test data for which you know the true values. Every row in a confusion matrix portrays an actual class, and every column portrays the predicted class.

Step 6 - Visualizing the model

Finally, the code below will help you visualize the model.

# Visualizing the test dataset results
from matplotlib.colors import ColormapListed
X_datsetset, y_datasetset = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_dataset[:,0].min()-1, stop = X_dataset[:. 0].max() + 1, step =
np.arange(start = X_dataset[:, 1[.min() -1, stop = X_dataset[:, 1].max() +1, step = 0.02))
plt.contourf(X1, X2, Classifier.predict(np.array([X1.ravel(), X2.ravel()].T).rescape(X1.shape),
alpha = 0.3, cmap = ColormapListed((‘yellow’, ‘blue’)))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for u, v in enumerate(np.unique(y_set)):
plt.scatter(X_dataset[y_dataset == v, 0], X_dataset[y_dataset== v, 1],
c = ColormapListed((‘yellow’,’blue’))(i), label = v)

However, these steps might not be necessary in some cases. But the above example provides a clear idea and information about how data points can be classified. Additionally, our data science tutorial will help you explore more about the Bayes theorem and its applications. 

What are The Advantages & Disadvantages of Naive Bayes?

Machine learning widely uses classification techniques to categorize data into predefined classes or groups. Naive Bayes is one of the most popular classification algorithms that use probabilistic methods to predict the likelihood of an instance belonging to a particular class. While it offers several advantages over other classification techniques, some disadvantages are worth considering.


1. Simple Yet Effective Approach: Naive Bayes is easy to understand and implement, making it suitable for beginners in machine learning. Despite its simplicity, it can provide accurate predictions with high precision and recall rates.

2. Can Handle Large Datasets Efficiently: The algorithm's computational complexity does not increase with the size of the dataset; thus, it can work well on large datasets without compromising performance.

3. Works Well Even When There are Many Irrelevant Features Present: Naive Bayes assumes all features contribute independently to predicting the output variable; hence, irrelevant features do not affect its accuracy significantly.

4. Performs Better Than Some Complex Algorithms Like SVM When Dealing With Text-Based Datasets: Naive Bayes has been found to be more efficient than Support Vector Machines (SVM) in processing text-based data due to its ability to handle sparse data effectively.


1. Assumes Independence Between Features Which May Not Always Hold True: In reality, some attributes may be dependent on each other and negatively impact prediction accuracy.

2. Requires Sufficient Labeled Data For Accurate Predictions: Like any supervised learning method, naive bayes require enough labeled training examples for model building. Insufficient training samples might result in inaccurate classifications or biases towards certain classes.

3. Cannot Handle Missing Values Easily Without Making Assumptions About Them: When faced with missing values within input variables, Naïve Bayes makes assumptions regarding their distribution or replacement strategy, leading to either loss of information or biasing results towards specific outcomes.

Application of Naive Bayes Algorithm 

The Naive Bayes algorithm is a probabilistic machine learning algorithm that works on the principle of conditional probability. It assumes that features are independent of one another and uses Bayes' theorem to calculate the probability of an event given its prior knowledge.One significant advantage of this algorithm is its speed, which makes it ideal for real-time predictions. For instance, in online advertising, Naive Bayes can be used to predict whether a user will click on an ad or not based on their browsing behavior.Another popular use case for this algorithm is spam filtering in email services like Gmail. With the help of Naive Bayes, emails can be classified as either spam or non-spam based on their content and other attributes.

Moreover, Naive Bayes is particularly useful when dealing with multi-class classification problems involving multiple target classes. This type of problem frequently arises in areas such as medical diagnosis and image recognition.Sentiment analysis is yet another area where Naive Bayes has been successfully applied. Companies can gain insights into how customers perceive their products or services by analyzing customer feedback or social media posts using sentiment analysis tools based on this algorithm.

Finally, recommendation systems also rely heavily on the collaborative filtering technique supported by Naive Bayes. These systems analyze user behavior patterns to recommend relevant resources (such as movies and books) that they may find interesting based on their past preferences and ratings.


The naive bayes algorithm is one popular method among several data classification methods available today due to its simplicity and effectiveness in handling large datasets efficiently while also being able to perform well even when many irrelevant features are present. However, it comes with limitations such as assumption independence between variables which may not always hold true, so careful consideration must be taken before applying this technique depending upon your specific needs!

We hope you are now clear about what  Bayes classification methods are, their advantages and disadvantages, and the data Naive Bayes Algorithm. You can learn data science online with us.

Check our community page about data science Data Science Community

Trending Courses

Cyber Security icon

Cyber Security

  • Introduction to cybersecurity
  • Cryptography and Secure Communication 
  • Cloud Computing Architectural Framework
  • Security Architectures and Models
Cyber Security icon1

Upcoming Class

16 days 05 Jul 2024

QA icon


  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing
QA icon1

Upcoming Class

8 days 27 Jun 2024

Salesforce icon


  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL
Salesforce icon1

Upcoming Class

7 days 26 Jun 2024

Business Analyst icon

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum
Business Analyst icon1

Upcoming Class

2 days 21 Jun 2024

MS SQL Server icon

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design
MS SQL Server icon1

Upcoming Class

9 days 28 Jun 2024

Data Science icon

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning
Data Science icon1

Upcoming Class

2 days 21 Jun 2024

DevOps icon


  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing
DevOps icon1

Upcoming Class

5 days 24 Jun 2024

Hadoop icon


  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation
Hadoop icon1

Upcoming Class

2 days 21 Jun 2024

Python icon


  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation
Python icon1

Upcoming Class

17 days 06 Jul 2024

Artificial Intelligence icon

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks
Artificial Intelligence icon1

Upcoming Class

10 days 29 Jun 2024

Machine Learning icon

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning
Machine Learning icon1

Upcoming Class

23 days 12 Jul 2024

 Tableau icon


  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop
 Tableau icon1

Upcoming Class

2 days 21 Jun 2024