RnewYear2022 RnewYear2022

- SQL Server Blogs -

Normalization-How to Normalize Database in a World of Abnormalities

Introduction

Imagine those busy weekends before Diwali, when your mom asks you to clean your room and rearrange all your belongings in a proper order. Your room otherwise are cluttered, with belongings lying anywhere and everywhere. It is very difficult to find anything in that mess. But after the day long work of removing the unnecessary things and moving around rest of your belongings in a proper way, you realize searching things has now become very easy. Same thing can happen to your data if you do not store them in a proper way. There can be data loss, redundancy and duplicity .To avoid all this we have normalization.

Anamolies in Database

A database anomaly is a fault in a database that usually emerges as a result of shoddy planning and storing everything in a flat database. In most cases, this is removed through the normalization procedure, which involves the joining and splitting of tables.

There are namely three types of anomalies:

  1. Insert
  2. Update
  3. Delete

Let us now understand about them in more details. For that let us create a table called Emp and fill it up with some data. Following are the code to create the table and insert data into it.

SQL Server Training & Certification

  • Detailed Coverage
  • Best-in-class Content
  • Prepared by Industry leaders
  • Latest Technology Covered

To Create the Table Emp

create table Emp

(
  Emp_id varchar(100),
  Emp_name varchar(100),
  Emp_Address varchar(100),
  Emp_dept varchar(100)
  )

To insert data into EMP table

insert into Emp(Emp_id,Emp_name,Emp_Address,Emp_dept) 
  select '101','Rick','Delhi','D001'
  union
  select '101','Rick','Delhi','D002'
  union
  select '123','Maggie','Agra','D890'
  union
  select '166','Glenn','Chennai','D900'
  union
  select '166','Glenn','Chennai','D004'

The output looks like below

The-output-looks-like-below

Insert Anamoly

An insertion anomaly is the inability to add data to the database due to the absence of other data.

Here in this example, if a person joins as a trainee, who has not been assigned a department it will be very difficult to enter his data into the table based on the current design as all the fields in the current table is mandatory and for the new joinee there is no department assigned.

Update Anamoly

An update anomaly is a data inconsistency that results from data redundancy and a partial update.

Glenn currently looks after two department and leaves in Chennai. Suppose Glenn decides to change his address from Chennai to Kolkata. Since Glenn works in multiple departments and there are multiple entries for him in the table, data for the all the rows has to be changed. If by any chance, we miss out on one of the rows, it will create an error which we term as update anomaly.

Delete Anamoly

A deletion anomaly occurs when you delete a record that may contain attributes that shouldn't be deleted. In the above example, if we delete department D890, all the data related to that department will be deleted.

To remove all these above mentioned anomalies, we implement normalization.

What is Normalisation

Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

A large database defined as a single relation may result in data duplication. This repetition of data may result in:

  • Making relations very large.
  • It isn't easy to maintain and update data as it would involve searching many records in relation.
  • Wastage and poor utilization of disk space and resources.
  • The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyze and decompose the relations with redundant data into smaller, simpler, and well-structured relations that are satisfying desirable properties. Normalization is a process of decomposing the relations into relations with fewer attributes.

In short normalization means:

  • Normalization is the process of organizing the data in the database.
  • Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
  • Normalization divides the larger table into smaller and links them using relationships.
  • The normal form is used to reduce redundancy from the database table.

Why do we need normalization?

The main reason for normalizing the relations is removing these anomalies. Failure to eliminate anomalies leads to data redundancy and can cause data integrity and other problems as the database grows. Normalization consists of a series of guidelines that helps to guide you in creating a good database structure.

Types of Normalisation

Normalization works through a series of stages called Normal forms. The normal forms apply to individual relations. The relation is said to be in particular normal form if it satisfies constraints.

Types-of-Normalisation

Normal Form

Description

1NF

A relation is in 1NF if it contains an atomic value.

2NF

A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependant on the primary key

3NF

A relation will be in 3NF if it is in 2NF no transition dependency exist

BCNF

A stronger definition of 3NF is known as Boyce Codd normal form

4NF

A relation will be in 4 NF if it is in BCNF and has no multi valued dependency

5NF

A relation is in 5NF if it is in 4NF and does not contain any join dependency, joining should be lossless.

Next we will discuss about this above mentioned normal forms in more details.

First Normal Form

First normal form says an attribute (column) of a table cannot hold multiple values. It should only contain atomic values.

Let us assume a table called Empfirstnormal which contains fields called Empid,Emp_Name,Emp_Address and Emp_mobile.The query will be as follows.

create table EmpFirstnormalform

(
  Emp_id varchar(100),

  Emp_name varchar(100),

  Emp_Address varchar(100),

  Emp_mobile varchar(100)

  )

Let us also insert some data into it.The insert query are as follows.

insert into EmpFirstnormalform(Emp_id,Emp_name,Emp_Address,Emp_mobile) 

select '101','Rick','Delhi','88859121' union select '101','Rick','Delhi','86251028,8677892410' union select '123','Maggie','Agra','88845555' union select '166','Glenn','Chennai','87445577,87465542'

The final output looks like below.

The-final-output-looks-like-below

As you can see that in the Emp_mobile column, there are multiple phone numbers separated by a comma. First normal form does not allow that. So the modified table will look like below

As-you-can-see-that-in-the-Emp_mobile-column-there-are-multiple-phone-numbers-separated-by-a-comma

The query to create the table is as below.

create table EmpFirstnormalformmodified

  (

  Emp_id varchar(100),

  Emp_name varchar(100),

  Emp_Address varchar(100),

  Emp_mobile varchar(100)

  )

And the following are the insert statements.

insert into EmpFirstnormalformmodified(Emp_id,Emp_name,Emp_Address,Emp_mobile) 

  select '101','Rick','Delhi','88859121'

  union

  select '101','Rick','Delhi','86251028'

   union

  select '101','Rick','Delhi','8677892410'

  union

  select '123','Maggie','Agra','88845555'

  union

  select '166','Glenn','Chennai','87445577'

  union

  select '166','Glenn','Chennai','87465542'

Second Normal Form

Second normal form says

  1. Table should be in first normal form
  2. No prime attribute is dependent on the proper subset of any candidate key of tables.

For this let us assume a table called Teachersecondnormalform with the fields like teacher_id,subject and teacherage.The query for creating the tables are as follows:

create table Teachersecondnormalform

  (

  teacher_id varchar(100),

  subject varchar(100),

  teacherage varchar(100),

  )

And the query to enter data is as follows

insert into Teachersecondnormalform(teacher_id,subject,teacherage) 

  select '101','Maths',38

  union

  select '111','Physics',40

  union

  select '222','Biology',50

  union

  select '333','Physics',50

  union

  select '333','Chemistry',50

The output is as follows

The-output-is-as-follows

Here both the subject and the teacherage column are individually dependent on the teacher_id column. Second normal form does not allow this. To bring the table into second normal form we have to break the single table into two separate tables. The output will be as below.

The-output-will-be-as-below

The query to create the table and insert the data into those two tables as as below.

Table Creation

create table Teachersage

  (

  teacher_id varchar(100),

   teacherage varchar(100),

  )

  create table Teacherssubject

  (

  teacher_id varchar(100),

  subject varchar(100),

  )

Data Insertion

insert into Teacherssubject(teacher_id,subject) 

  select '101','Maths'

  union

  select '111','Physics'

  union

  select '222','Biology'

  union

  select '333','Physics'

  union

  select '333','Chemistry'

  insert into Teachersage(teacher_id,teacherage) 

  select '101',38

  union

  select '111',40

  union

  select '222',50

  union

  select '333',50

Third Normal Form

Third normal form says

  1. Table must be in second normal form
  2. Transitive functional dependency of non-prime attribute on any super key should be removed.

To discuss about this let us take the example of the table empthirdnormalform with the following fields empid, empname, empzip, empstate, empcity, empdistrict.The query for the table creation is as follows.

create table empthirdnormalform

  (

  empid varchar(100),

 empname varchar(100),

  empzip varchar(100),

  empstate varchar(100),

  empcity varchar(100),

  empdistrict varchar(100)

  )

And the query to enter data is as follows

insert into empthirdnormalform(empid,empname,empzip,empstate,empcity,empdistrict) 

  select '101','John','282005','UP','Agra','Dayalbagh'

  union

  select '111','Sara','282006','West Bengal','Kolkata','Lake Gardens'

  union

  select '112','Sara1','282007','Maharastra','Mumbai','Bandra'

  union

  select '113','Sara2','282008','Tamil Nadu','Chennai','RT Nagar'

The output of the table looks like below.

The-output-of-the-table-looks-like-below

If you look closely at the table, the fields empstate,empcity,empdistrict are individually dependant on empzip.To remove this anamoly we have to create two separate tables employee and employeezip one containing empid,empname and empzip and other empzip,empstate ,empcity and empdistrict.

The query for the two new tables are as follows.

create table employee

  (

  empid varchar(100),

 empname varchar(100),

  empzip varchar(100),

    )

create table employee_zip

  (

  empzip varchar(100),

 empstate varchar(100),

  empcity varchar(100),

  empdistrict varchar(100)

    )

Following are the query to insert data in those two tables.

insert into employee(empid,empname,empzip) 

  select '101','John','282005'

  union

  select '111','Sara','282006'

  union

  select '112','Sara1','282007'

  union

  select '113','Sara2','282008'

and

insert into employee_zip(empzip,empstate,empcity,empdistrict) 

  select '282005','UP','Agra','Dayalbagh'

  union

  select '282006','West Bengal','Kolkata','Lake Gardens'

  union

  select '282007','Maharastra','Mumbai','Bandra'

  union

  select '282008','Tamil Nadu','Chennai','RT Nagar'/p>

Advantages of Normalisation

  • Normalization helps to minimize data redundancy.
  • Greater overall database organization.
  • Data consistency within the database.
  • Much more flexible database design.
  • Enforces the concept of relational integrity.

Disadvantages of Normalisation

  • You cannot start building the database before knowing what the user needs.
  • The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
  • It is very time-consuming and difficult to normalize relations of a higher degree.
  • Careless decomposition may lead to a bad database design, leading to serious problems.

Conclusion

Over the last few paragraphs, we have learned about different aspects of normalization, its utility, advantages and disadvantages with practical examples. Hope this blog will help you to understand normalization and give you enough interest to learn normalization in more details in future.

SQL Server Training & Certification

  • Personalized Free Consultation
  • Access to Our Learning Management System
  • Access to Our Course Curriculum
  • Be a Part of Our Free Demo Class


     user

    Nandita

    With fact-finding market research & solicitous words, Nandita helps our digital learners globally navigate their way to profound career possibilities in IT and Management.


Comments

Related Courses

Trending Courses

salesforce

AWS

  • AWS & Fundamentals of Linux
  • Amazon Simple Storage Service
  • Elastic Compute Cloud
  • Databases Overview & Amazon Route 53
salesforce

Upcoming Class

2 days 04 Feb 2023

salesforce

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing
salesforce

Upcoming Class

1 day 03 Feb 2023

salesforce

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning
salesforce

Upcoming Class

1 day 03 Feb 2023

salesforce

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation
salesforce

Upcoming Class

1 day 03 Feb 2023

salesforce

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL
salesforce

Upcoming Class

1 day 03 Feb 2023

salesforce

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing
salesforce

Upcoming Class

1 day 03 Feb 2023

salesforce

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum
salesforce

Upcoming Class

1 day 03 Feb 2023

salesforce

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design
salesforce

Upcoming Class

1 day 03 Feb 2023

salesforce

Python

  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation
salesforce

Upcoming Class

15 days 17 Feb 2023

salesforce

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks
salesforce

Upcoming Class

1 day 03 Feb 2023

salesforce

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning
salesforce

Upcoming Class

22 days 24 Feb 2023

salesforce

Tableau

  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop
salesforce

Upcoming Class

1 day 03 Feb 2023

Interviews