rnew icon6Grab Deal : Flat 30% off on live classes + 2 free self-paced courses! - SCHEDULE CALL rnew icon7

What is Apriori Algorithm in Data Mining

 

In order to mine frequently occurring item sets, the Apriori algorithm was the first one ever developed. R Agarwal and R Srikant are responsible for developing the improved version known as Apriori. This method utilizes  two steps to restrict the search, and those phases are called "join" and "prune." Together, they are referred to as the "join" phase. Iterative steps are taken throughout determining which classifications of things are the most prevalent. Continuous characteristics are a requirement for many different types of data mining projects in the real world, and having a fantastic data scientist resume can help you demonstrate how you are the best fit for the respective job role. Employers or recruiters can easily notice you and, most prominently, come up with job interviews. Refer to the data scientists resume sample writing guide if you’re looking for how to make a perfect data scientist resume. 

The Method's Name Comes From Using Previous knowledge of Frequent Itemset Features, as We Will See in The Following Section of This Article.

A level-wise search is an iterative method that Apriori uses. In this method, k-itemsets are utilized to investigate (k+1)-itemsets. First, the set of frequent 1-item sets is discovered by scanning the database to accumulate the count for each item and then collecting those items that fulfill the minimal support requirements. The resulting set has been given the designation L1. After then, L1 is used to discover L2, which is the set of frequent 2-item sets. L2 is then used to find L3, and so on, until there are no more frequent k-itemsets left to locate. Finding each Lk requires doing a comprehensive search of the database.An essential attribute known as the Apriori property, which will be discussed further in this section, is used to cut down on the search space to make the level-wise production of frequent item sets more effective.

We will begin by defining this attribute and then move on to illustrate how it may be utilized.The a priori property is: It is required that any nonempty subset of a frequent itemset likewise be frequent.The subsequent observation forms the foundation for the a priori attribute. An itemset I is not considered frequent if it does not meet the minimal support barrier, often known as min sup; this is denoted by the statistic P(I) being less than min sup. If item A is added to itemset I, then the new itemset, denoted by the notation I A, cannot occur more frequently than itemset I. As a result, I A is also uncommon; in other words, P(I A) is less than min sup.

If a set fails a test, all of its supersets will also yield the same test, which places this trait in a unique subcategory of anti-monotone properties. It gets its name because the attribute is monotonic when applied to the scenario of not passing a test."In what ways does the apriori property factor into the algorithm?" Let's look at how the value of Lk1 may be used to determine the value of Lk when k is less than 2. The method is broken down into two stages: a join phase and a pruning step.

The steps followed in the Apriori Algorithm of data mining are:

  • Join Step: This step generates (K+1) itemsets  from K-itemsets by joining each item with itself. 
  • Prune Step: This step scans the count of each item in the database. If the candidate item does not meet minimum support, then it is regarded as infrequent, and thus it is removed. This step is performed to reduce the size of the candidate itemsets. 

Apriori Algorithm: Data Mining's Working Methodology?

By proceeding through the steps of the apriori technique in the appropriate order, one can discover the group of objects in a database that occurs with the most significant frequency. This approach to data mining uses an iterative procedure that combines join and prune to discover the itemset that occurs the most frequently. The problem defines a minimum necessary degree of support, or the user is making that assumption.

  1. Each item is evaluated to see whether or not it might be a part of a one-item set. The system will keep a running record of how frequently each item occurs.
  2. There needs to be some support or backing, minimum sup (eg, 2). Calculated is the collection of one-item sets whose occurrences meet the requirements of the minimum supplement. When it comes time for the next iteration, only those candidates whose score is more than or equal to the minimum sufficient score are retained in consideration.
  3. After that, we look for often recurring things with a minimum sup that occurs in sets of two items. In order to do this, the 2-item set is created during the join step by combining two of the same items to produce a group of two.
  4. An unfit candidate will have their chances of being selected from the 2-itemset pool reduced by applying a min-sup threshold value. There are now 2 min-sup -itemsets included within the table.
  5. The join and prune process will be utilized in the subsequent iteration to produce three different item sets. In accordance with the anti-monotone property, the subsets of 3-itemsets, namely the 2-itemset subsets of each group, will all be included inside min sup throughout this cycle. If all of the superset's subsets consisting of two items are frequent, then and only then will the superset itself be frequent.
  6. If the subset of the 3-item set does not meet the minimum sup condition, the following step is to prune it, resulting in the 4-item set being produced. The algorithm comes to a close after it has identified the group of things that are utilized the most frequently.

Apriori Algorithm Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset = pd.read_csv('../input/Market_Basket_Optimisation.csv', header = None) #To make sure the first row is not thought of as the heading
dataset.shape
#Transforming the list into a list of lists, so that each transaction can be indexed easier
transactions = []
for i in range(0, dataset.shape[0]):
    transactions.append([str(dataset.values[i, j]) for j in range(0, 20)])
print(transactions[0])
from apyori import apriori
# Please download this as a custom package --> type "apyori"
# To load custom packages, do not refresh the page. Instead, click on the reset button on the Console.
rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)
# Support: number of transactions containing set of times / total number of transactions
# .      --> products that are bought at least 3 times a day --> 21 / 7501 = 0.0027
# Confidence: Should not be too high, as then this wil lead to obvious rules
#Try many combinations of values to experiment with the model. 
#viewing the rules
results = list(rules)
#Transferring the list to a table
results = pd.DataFrame(results)
results.head(5)

Why Apriori Algorithm?

Frequent Pattern Mining (FPM)

One of the most useful data mining strategies for identifying hidden connections between data points is the frequent pattern mining algorithm. Association rules are used to depict these interconnections. It helps spot anomalies in data.

  • FPM may be used for various purposes in fields such as data analysis, software issue tracking, cross-marketing, sales campaign analysis, product mix analysis, and more.
  • Apriori's discovery of standard sets has various uses in data mining projects. Among the most crucial of these are tasks like mining association rules and sequence discovery from databases.
  • To analyze shopper habits, in terms of the items they've bought, supermarket transaction data may be analyzed using association rules.
  • The association rules describe the frequency with which the things are purchased together.

Frequent Itemset Mining

Mining of frequent itemsets or patterns is used extensively for a wide range of data mining tasks, such as mining association rules, correlations, and graph patterns bound by frequent and sequential patterns. Many other types of patterns

Benefit Apriori Algorithm Offer:

This association rule learning algorithm is the most straightforward of all those available in the association rule learning algorithms field.

  • The resulting regulations are user-friendly and easy to comprehend, making them a win-win situation.
  • Since it is entirely unsupervised, it may be utilized in various contexts where unlabeled data is more readily available.Based on this implementation, numerous other adaptations were suggested for various applications.
  • For example, algorithms for association learning have been created that consider the arrangement of the data, its size, and the time stamps on it.The algorithm looks in every possible direction for any rules that satisfy the specified criteria for support and confidence, and it does so thoroughly.

In What Ways Does Apriori Algorithm Fall Short?

The Apriori Algorithm's sluggishness is one of its main drawbacks. This is the case due to the fact that the:

  • Apriori dataset contains a significant quantity of item sets.
  • Apriori has weak minimum support because of its sparse data collection.
  • The length of time required to keep track of a large number of candidate sets, including many frequently occurring item sets.
  • Ineffective for processing massive data sets.

For the sake of argument, assume there is a frequent-1 itemset containing 104. More than 107 2-length candidates will need to be generated by the Apriori algorithm code before they can be examined and accumulated. An example of the Apriori algorithm in action is the generation of 2100 alternative itemsets or candidates for spotting a frequent pattern of size 100 (with v1, v2,..., v100).

Thus, the temporal complexity of the Apriori method increases, along with the yield costs, since more time is spent than necessary on candidate creation.

Furthermore, it performs many costly database scans to enhance the Apriori method and verify the numerous candidate itemsets produced from the various sets. The algorithm suffers when there are frequent transactions but needs more system memory. Large datasets cause the method to become inefficient and sluggish.

Is there a Way to Make the Apriori Algorithm Even More Effective?

The algorithm's efficiency may be increased in a number of ways.

  • This approach generates the k-item sets and their respective counts using a hash-based structure called a hash table. The hash function is used to create the table.
  • With this technique, we may minimize the number of iterative scans required for a given amount of transactions. All purchases that do not include frequently used products are flagged or deleted.
  • Two database queries are all that are needed to mine the most common item groups using this technique. It states that any item set can be considered frequent in the database if it is frequent in at least one of its partitions.
  • This technique takes a random subset (S) of data (D) and uses it to look for a recurrent group of items (F). A global frequently used item collection might need to be found. Decreasing the min sup value can help with this.During a database scan, new candidate itemsets can be added dynamically at any predetermined starting point.

cta10 icon

Data Science Training

  • Personalized Free Consultation
  • Access to Our Learning Management System
  • Access to Our Course Curriculum
  • Be a Part of Our Free Demo Class

Conclusion

While the apriori algorithm may have limitations related to memory, complexity, and scalability, it remains a potent tool for analyzing vast amounts of data. Analysts, researchers, and businesses can incorporate novel concepts to improve their workflows in a rapidly changing technological landscape. Innovation and adaptation are essential to staying competitive. The Understanding of apriori algorithm in data mining begins with understanding data science; you can get an insight into the same through our Data Science Training.

Trending Courses

Cyber Security icon

Cyber Security

  • Introduction to cybersecurity
  • Cryptography and Secure Communication 
  • Cloud Computing Architectural Framework
  • Security Architectures and Models
Cyber Security icon1

Upcoming Class

-1 day 10 May 2024

QA icon

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing
QA icon1

Upcoming Class

-1 day 10 May 2024

Salesforce icon

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL
Salesforce icon1

Upcoming Class

-1 day 10 May 2024

Business Analyst icon

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum
Business Analyst icon1

Upcoming Class

-1 day 10 May 2024

MS SQL Server icon

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design
MS SQL Server icon1

Upcoming Class

6 days 17 May 2024

Data Science icon

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning
Data Science icon1

Upcoming Class

-1 day 10 May 2024

DevOps icon

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing
DevOps icon1

Upcoming Class

4 days 15 May 2024

Hadoop icon

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation
Hadoop icon1

Upcoming Class

-1 day 10 May 2024

Python icon

Python

  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation
Python icon1

Upcoming Class

14 days 25 May 2024

Artificial Intelligence icon

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks
Artificial Intelligence icon1

Upcoming Class

7 days 18 May 2024

Machine Learning icon

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning
Machine Learning icon1

Upcoming Class

20 days 31 May 2024

 Tableau icon

Tableau

  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop
 Tableau icon1

Upcoming Class

-1 day 10 May 2024