Good Friday Sale : Flat 30% off on live classes + 2 free self-paced courses! - SCHEDULE CALL

- SQL Server Blogs -

Data Anamolies-Understanding Them And Learning Ways To Resolve Them



Introduction

Imagine trying to find out an important document in your otherwise cluttered room when you are in a hurry to your office. When you do not find it yourself, you have to call your mom or dad to get it for you. You all try hard to find them everywhere and by the time you get it you are already late to your office. Why did it happen? It is because you have not kept your belongings in a proper managed way. Similar is the case with your data. If you do not store it in a disciplined manner there will be data redundancy and duplicity. This is called database anomaly. Over the next few paragraphs, we will learn about the different aspects of database anomaly.

SQL Server Training & Certification

  • Detailed Coverage
  • Best-in-class Content
  • Prepared by Industry leaders
  • Latest Technology Covered

What is Database Anomaly?

A database anomaly is a fault in a database that usually emerges as a result of shoddy planning and storing everything in a flat database. In most cases, this is removed through the normalization procedure, which involves the joining and splitting of tables.

There are namely three types of anomalies:

  1. Insert
  2. Update
  3. Delete

Let us now understand them in more detail. For that let us create a table called Emp and fill it up with some data. Following are the code to create the table and insert data into it.

Insert Anamoly

An insertion anomaly is the inability to add data to the database due to the absence of other data.

Here in this example, if a person joins as a trainee, who has not been assigned a department it will be very difficult to enter his data into the table based on the current design as all the fields in the current table is mandatory and for the new joinee there is no department assigned 

Update Anamoly

An update anomaly is a data inconsistency that results from data redundancy and a partial update.

Glenn currently looks after two department and leaves in Chennai. Suppose Glenn decides to change his address from Chennai to Kolkata. Since Glenn works in multiple departments and there are multiple entries for him in the table, data for the all the rows has to be changed. If by any chance, we miss out on one of the rows, it will create an error which we term as update anomaly.

Delete Anamoly

A deletion anomaly occurs when you delete a record that may contain attributes that shouldn't be deleted. In the above example, if we delete department D890, all the data related to that department will be deleted.

To remove all these above-mentioned anomalies, we implement various techniques like normalization, implementation constraints and keys. Next we will learn about those things in detail. 

Constraints

In DBMS, constraints are the set of rules that ensures that when an authorized user modifies the database they do not disturb the data consistency and the constraints are specified within the DDL commands like “alter” and “create” command. There are several types of constraints available in DBMS and they are:

  • Domain constraints
  • Entity Integrity constraints
  • Referential Integrity constraints
  • Key constraints

Keys in DBMS

  • Keys play an important role in the relational database.
  • It is used to uniquely identify any record or row of data from the table. It is also used to establish and identify relationships between tables.

For example, ID is used as a key in the Student table because it is unique for each student. In the PERSON table, passport_number, license_number, SSN are keys since they are unique for each person.

Types of Keys

Following are the different types of keys:

  1. Primary Key
  2. Candidate Key
  3. Super Key
  4. Foreign Key
  5. Alternate Key
  6. Composite Key
  7. Artificial Key

Primary Key

  • It is the first key used to identify one and only one instance of an entity uniquely. An entity can contain multiple keys, as we saw in the PERSON table. The key which is most suitable from those lists becomes a primary key.
  • In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the EMPLOYEE table, we can even select License_Number and Passport_Number as primary keys since they are also unique.
  • For each entity, the primary key selection is based on requirements and developers.

Query

CREATE TABLE [dbo].[Employee](

       [EmpID] [int] NOT NULL,

       [EmpName] [varchar](50) NOT NULL,

       [Salary] [int] NOT NULL,

       [Address] [varchar](200) NOT NULL,

PRIMARY KEY CLUSTERED

(

       [EmpID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

Candidate Key

  • A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
  • Except for the primary key, the remaining attributes are considered a candidate key. The candidate keys are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like SSN, Passport_Number, License_Number, etc., are considered a candidate key.

Super Key

Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also be a key.

The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.

Foreign Key

  • Foreign keys are the column of the table used to point to the primary key of another table.
  • Every employee works in a specific department in a company, and employee and department are two different entities. So we can't store the department's information in the employee table. That's why we link these two tables through the primary key of one table.
  • We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the EMPLOYEE table.
  • In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.

Query

ALTER TABLE [Sales].[SalesOrderDetail]  WITH CHECK ADD  CONSTRAINT [FK_SalesOrderDetail_SpecialOfferProduct_SpecialOfferIDProductID] FOREIGN KEY([SpecialOfferID], [ProductID])

REFERENCES [Sales].[SpecialOfferProduct] ([SpecialOfferID], [ProductID])

 

Alternate Key

There may be one or more attributes or a combination of attributes that uniquely identify each tuple in a relation. These attributes or combinations of the attributes are called the candidate keys. One key is chosen as the primary key from these candidate keys, and the remaining candidate key, if it exists, is termed the alternate key. In other words, the total number of alternate keys is the total number of candidate keys minus the primary key. The alternate key may or may not exist. If there is only one candidate key in a relation, it does not have an alternate key.

For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No, acts as the Alternate key.

Composite Key

Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is also known as Concatenated Key.

For example, in employee relations, we assume that an employee may be assigned multiple roles, and an employee may work on multiple projects simultaneously. So the primary key will be composed of all three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So these attributes act as a composite key since the primary key comprises more than one attribute.

Artificial Key

The key created using arbitrarily assigned data are known as artificial keys. These keys are created when a primary key is large and complex and has no relationship with many other relations. The data values of the artificial keys are usually numbered in serial order.

For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in employee relations. So it would be better to add a new virtual attribute to identify each tuple in the relation uniquely.

Normalization

Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

A large database defined as a single relation may result in data duplication. This repetition of data may result in:

  • Making relations very large.
  • It isn't easy to maintain and update data as it would involve searching many records in relation.
  • Wastage and poor utilization of disk space and resources.
  • The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyze and decompose the relations with redundant data into smaller, simpler, and well-structured relations that are satisfying desirable properties. Normalization is a process of decomposing relations into relations with fewer attributes.

In short normalization means:

  • Normalization is the process of organizing the data in the database.
  • Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
  • Normalization divides the larger table into smaller ones and links them using relationships.
  • The normal form is used to reduce redundancy from the database table.

Why Do We Need Normalization?

The main reason for normalizing the relations is to remove these anomalies. Failure to eliminate anomalies leads to data redundancy and can cause data integrity and other problems as the database grows. Normalization consists of a series of guidelines that helps to guide you in creating a good database structure.

Types of Normalisation

Normalization works through a series of stages called Normal forms. The normal forms apply to individual relations. The relation is said to be in particular normal form if it satisfies constraints.

Normal Form

Description

1NF

A relation is in 1NF if it contains an atomic value.

2NF

A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependant on the primary key

3NF

A relation will be in 3NF if it is in 2NF no transition dependency exist

BCNF

A stronger definition of 3NF is known as Boyce Codd normal form

4NF

A relation will be in 4 NF if it is in BCNF and has no multi valued dependency

5NF

A relation is in 5NF if it is in 4NF and does not contain any join dependency, joining should be lossless.

SQL Server Training & Certification

  • No cost for a Demo Class
  • Industry Expert as your Trainer
  • Available as per your schedule
  • Customer Support Available

Conclusion

Over the last few paragraphs, we learned about the different aspects of database anomalies and ways to counter them. This write-up can be a way forward to further studies about database anomalies and how to prevent them while designing a real-time database.

fbicons FaceBook twitterTwitter google+Google+ lingedinLinkedIn pinterest Pinterest emailEmail

     Logo

    Sanchayan Banerjee

    I love to learn new things and also like sharing my knowledge with others. As an experienced IT Professional I like to update myself constantly with new and upcoming technologies. The database management system is one of my favorite subjects which I constantly explore.


Comments

Trending Courses

Cyber Security Course

Cyber Security

  • Introduction to cybersecurity
  • Cryptography and Secure Communication 
  • Cloud Computing Architectural Framework
  • Security Architectures and Models
Cyber Security Course

Upcoming Class

-1 day 29 Mar 2024

QA Course

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing
QA Course

Upcoming Class

-1 day 29 Mar 2024

Salesforce Course

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL
Salesforce Course

Upcoming Class

6 days 05 Apr 2024

Business Analyst Course

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum
Business Analyst Course

Upcoming Class

-1 day 29 Mar 2024

MS SQL Server Course

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design
MS SQL Server Course

Upcoming Class

6 days 05 Apr 2024

Data Science Course

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning
Data Science Course

Upcoming Class

-1 day 29 Mar 2024

DevOps Course

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing
DevOps Course

Upcoming Class

6 days 05 Apr 2024

Hadoop Course

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation
Hadoop Course

Upcoming Class

-1 day 29 Mar 2024

Python Course

Python

  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation
Python Course

Upcoming Class

6 days 05 Apr 2024

Artificial Intelligence Course

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks
Artificial Intelligence Course

Upcoming Class

7 days 06 Apr 2024

Machine Learning Course

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning
Machine Learning Course

Upcoming Class

20 days 19 Apr 2024

 Tableau Course

Tableau

  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop
 Tableau Course

Upcoming Class

6 days 05 Apr 2024

Search Posts

Reset

Receive Latest Materials and Offers on SQL Server Course

Interviews