Normalization in SQL: INF, 2NF, 3NF, and BCNF

normalization is sql

What is Normalization in SQL?

Normalization is a process where we remove redundancy from the data for enhanced data integrity. In this process, we normalize the table where the data in columns can be fetched with a key. It helps in organizing the data in the database. This process includes the data to be processed into tabular forms while eliminating redundancy from the relational tables. 

To better understand this concept, it simply means to bring something to its normal state. The columns and tables are organized to ensure that the data integrity constraints are appropriately executed with their dependencies in normalization. Normalization in SQL is mainly used to reduce the redundancy of the data. SQL is a language that interacts with the database to start any interactions with the data in the database. The data which is present in the database should be in the normalized form before it is processed further. That’s why normalization in SQL improves the data distribution in the database. 

What is Normalization in a Database?

Normalization is a design technique that is very useful for designing databases. This technique helps reduce data redundancy and eliminate undesired operations such as insertion, deletion, and updating the database’s data. The main goal of normalization in a database is to reduce the redundancy of the data. 

Database normalization reorganizes the data in a relational database based on normal forms. It helps in reducing the amount of unwanted data from the database. At the same time, normalization also improves the data integrity where the two principles that govern this process are:

  • The dependent data should be stored together.
  • There should not be any unwanted data. 

The inventor of Database normalization was Edgar F Codd. It is an integral part of his relational model that can also be considered the Father of all relational data models. The relational database engines that we use nowadays still follow the rules proposed by Edgar F. He extracted three normal forms of databases such as 1NF, 2NF, and 3NF. These normal forms differ as the normalization goes further. Here the first normal form is evaluated first, and then only the second normal form and other normal forms can be derived. He first proposed the first normal form, then extended this normal form to the second normal form, and continued with the theory of the second normal form, extending it to the third normal form. This way, he proposes all the theories of these normal forms. 

The third normal form was then extended by Raymond F Boyce, resulting in a new form named BCNF (Boyce Codd Normal Form). All these normal forms are discussed later in this article. 

Database Normal Forms

Database normal forms are beneficial as they normalize the tables in databases. In the Normalization process, the redundancy is reduced in a set of relational databases. If there is still redundancy in the data, it might cause insertion, deletion, or update anomalies in the database. Therefore, it is very helpful to minimize redundancy in relational databases. There is a total of seven normal forms that reduce redundancy in data tables, out of which we will discuss 4 normal forms in this article which are: 

  1. 1NF: This is the First Normal Form in which a relation contains an atomic value. 
  2. 2NF: The second normal form used for the normalization process. A relation in 2NF must be in 1NF, and all the non-key attributes depend on the primary key in the Second Normal Form. 
  3. 3NF: It stands for Third Normal Form, wherein if a relation is in 3NF, it must be in 2NF, and there should be no transition dependency. 
  4. BCNF: BCNF stands for Boyce-Codd Normal Form, which is stronger than 3NF.
  5. 4NF: This is the Fourth Normal Form which doesn’t contain any value dependency. A relation that is in 4NF also comes in BCNF. 
  6. 5NF: 5NF stands for Fifth Normal Form, where the relationship should be in 4NF to apply the fifth normal form. This normal form doesn’t contain any dependency. 
  7. 6NF: It stands for Sixth Normal Form, which is not a standardized form of normalization. Therefore it isn’t used nowadays and may give a clear and standardized normalization in the future. 

Database Normalization [With Examples]

As we discussed, database normalization might seem challenging to understand. So, let’s see an example of database normalization to understand this concept better. 

Example:

Suppose there is a table in the database containing information about the students who borrow different books from the library.

Name Address Books Name Gender
Aakash First Street, House No-102 Life is what you make it Male
Ravi Kishan Street – 3, House No-403 DBMS Concepts Male
Komal Arora Street No-2, Model Town Learn key skills of Management in an organization Female
Happy Singh Second Street, Junction Road Basics of Data Modelling Male

So, this is a table where each student borrows a different book. For this table to come under First Normal Form, it should contain unique records in each cell, and only a single value should be there. As we can see from the example, there is only one book borrowed by each student, and other cells also contain single values. So we can say that this standard form comes under 1NF. 

Normalization is a process in which the data is organized in a well-manner database. In the normalization process, the redundancy in the relations is removed to get the desired database table. For example:

Name Address Books Name Gender
Ashu Lakhwan First Street, House No-102 Life is what you make it, Laws of Attraction Male
Ravi Kishan Street – 3, House No-403 DBMS Concepts Male
Komal  Street No-2, Model Town Learn key skills of Management in an organization Female
Harish Singh Second Street, Junction Road Basics of Data Modelling Male

The table has multiple values in a single cell, which should be removed for normalization. After normalization of this table, it will look like this:

Name Address Books Name Gender
Ashu Lakhwan First Street, House No-102 Life is what you make it Male
Ashu Lakhwan First Street, House No-102 Laws of Attraction Male
Ravi Kishan Street – 3, House No-403 DBMS Concepts Male
Komal  Street No-2, Model Town Learn key skills of Management in an organization Female
Harish Singh Second Street, Junction Road Basics of Data Modelling Male

Now each cell in the table has a single value. Here the normalization of the table is done according to the First Normal Form. 

What is a KEY in SQL?

Key is a value used to identify the data in records differently. SQL key is beneficial when there are various columns in the table, and we need to identify a single or group of columns. SQL KEY helps identify the column we require to be extracted from the database. We can also identify duplicate data in the table by using SQL KEY. 

The relationships among different tables or columns are established by using the SQL key. These relationships create a difference between the tables or columns. 

As we know, SQL keys are used to identify columns uniquely, but some columns don’t have a SQL key and can’t be identified with a key. So, these columns are called non-key columns. 

A Key is a single value, but the key can be of different types such as Primary, Composite, or Foreign Key. Let us discuss these keys in detail:

Database – Primary Key

The primary key is very useful when we need to identify only one value from the entity. However, the entity may contain various keys, but the most suitable key is called the Primary Key. For example, the key that can be used to identify an employee in an Employee table can be Employee ID, as it is different for every entry in the table. So, we can make Employee ID the Primary Key in this case. Now the selection of an employee can be made by using the primary key. 

Employee
Employee_ID
Employee_NAME
Employee_Salary
Employee_AGE

Database – Composite Key

The composite Key becomes useful when there are more attributes in the Primary Key. For example, let us talk about the same table we discussed above Employee table. Suppose the employees are given different Project Ids and roles that can help them uniquely identify from the table. Now, the employee can be identified with the help of any of the keys such as Employee_ID, Employee_ProjectID, or Employee_ROLE. So, there are multiple primary keys. When a primary key has more attributes to be considered, it is called a composite key. Therefore, combining all these keys is called Composite Key or Cancatenated Key. 

Database – Foreign Key

Foreign Key is a list of column names that refer to other tables in the database. These keys work as the primary key of another table. Suppose there are two tables in the database, such as the Employee table and the department table. Now, a department ID is assigned to each employee, which can be used to identify the data from the Department table of the database. A primary key of another table is added as a new attribute to the main table. Now that key can be used to identify the data of another table from our main table. 

EmployeeTable Department Table
Emp_ID Dep_NAME
NAME DepGroup
DepID DepID

As you can see, the Primary key of the Department table, which is DepID, is given as an attribute in the Employee table. Now, the DepID key can be used to identify the data from the Department table. 

What are transitive functional dependencies?

Transitive functional dependency occurs when we change the non-key column, and it affects other non-key columns to be changed. It mostly happens with non-key columns. It refers to some non-prime attributes that are other than candidate keys. So if there is an indirect relationship in the table that causes functional dependency, it is known as Transitive Functional Dependency. For example:

if A -> B, 

and B -> C, then

A -> C is a Transitive Functional Dependency. 

It should also be noted that to achieve 3NF; the transitive dependency should be removed from the table. The Transitive functional dependency only occurs when two functional dependencies form it. Also, the transitive functional dependency occurs for 3 or more attributes as there is functional dependency among them. 

Suppose there is a table where two columns are dependent on each other, and changing the value of one column might affect the value of the other column, i.e., changing the person’s name may cause a change in the Gender column. So, these columns are functionally dependent on each other, called transitive functional dependency. In this case, the relationship is indirect among the columns. 

As we know that transitive dependency should be removed to achieve the Third Normal Form. Let us see how we can do that:

Name Emp-ID Gender Salary
Robert 29 Male $20000
Merissa 32 Female $25000
Krishna 30 Male $30000

Now, as you can see, changing the name from the first column will affect the Gender column of the table. The above table is not in 3NF as it has a transitive functional dependency.

Name -> Gender

To remove this dependency, we need to divide the table into sub tables such as

Employee

Name Emp-ID Salary
Robert 29 $20000
Merissa 32 $25000
Krishna 30 $30000

Gender

Emp-ID Gender
29 Male
32 Female
30 Male

Now, we divided the table as no column is dependent on one another. Now, the functional dependency is removed from the tables, and we can say that the above relation is in 3NF of normalization. 

1st Normal Form (1NF)

This normal form comes with the problem of Atomicity, which means that tables cannot be divided further into subtables. We can say that a cell cannot hold multiple values. For the table to be in First Normal Form, it should not contain composite or multi-valued attributes. 

Let us understand this normal form with an example discussed earlier. 

Name Address Books Name Gender
Aakash First Street, House No-102 Life is what you make it, Law of Attraction Male
Ravi Kishan Street – 3, House No-403 DBMS Concepts Male
Komal Arora Street No-2, Model Town Learn key skills of Management in an organization Female
Happy Singh Second Street, Junction Road Basics of Data Modelling Male

As you can see, the first cell in the Books Name column contains multiple values, violating the First Normal Form rules. But we can convert this table to be in 1NF as below:

Name Address Books Name Gender
Aakash First Street, House No-102 Life is what you make it Male
Aakash First Street, House No-102 Law of Attraction Male
Ravi Kishan Street – 3, House No-403 DBMS Concepts Male
Komal Arora Street No-2, Model Town Learn key skills of Management in an organization Female
Happy Singh Second Street, Junction Road Basics of Data Modelling Male

The table above contains the single value in each cell. Now it is in its First normal form and cannot be divided further. 

Second Normal Form (2NF)

The table can only be in the second normal form in the First Normal Form, meaning the table has to be in 1NF before it can be normalized to a second normal form. It should be noted that the table should not contain any partial dependency, where partial dependency means a proper subset of the candidate key. This candidate key is used to determine a non-prime attribute. Let us see the example of an Employee table where the primary key is an Employee ID and Department ID. Both the keys are on a single table. So, the main table can be divided into two subtables that contain the composite primary key. 

Emp-Name Emp-ID Location Dep-ID
Robert 29 Delhi 22
Merissa 32 Banglore 13
Krishna 30 Mumbai 44

Let’s divide the table into sub tables:

Emp-ID

Emp-Name Emp-ID Location
Robert 29 Delhi
Merissa 32 Banglore
Krishna 30 Mumbai

Dep-ID

Emp-Name Emp-ID Dep-ID
Robert 29 22
Merissa 32 13
Krishna 30 44

As we removed the partial dependency from the table, the table’s primary key, which is Emp-ID, can be used to determine the specific information.

Third Normal Form (3NF)

As we just discussed, the second normal form where the table has to be in the first normal form to satisfy the rules of 2NF. The same applies in 3NF, where the table has to be in 2NF before proceeding to 3NF. There is another condition, too, that no transitive dependency should be there for non-prime attributes. This means that non-prime attributes should not be dependent on the other non-prime attributes of the table. 

For example:

Emp-Name Emp-ID Location Dep-ID
Robert 29 Delhi 22
Merissa 32 Banglore 13
Krishna 30 Mumbai 44

In the above table, the Department determines the employee’s name using the Emp-ID and Dep-ID, which shows that there is a transitive functional dependency in the table. To remove this dependency, the table can be divided as follows:

Emp-Name Emp-ID
Robert 29
Merissa 32
Krishna 30
Emp-ID Location Dep-ID
29 Delhi 22
32 Banglore 13
30 Mumbai 44

Now all the non-key attributes are fully functional and dependent only on the primary key. In the first column, the Emp-Name is only dependent only on Emp-ID. In the second table, the Emp-ID and Location are only dependent on Dep-ID. 

Boyce Codd Normal Form (BCNF)

This normal form is the extended version of 3NF and is also known as 3.5NF. It was developed by Raymond F Boyce and Edgar F. Codd, who defined various types of anomalies not defined in 3NF, such as Insertion, Deletion, or Update anomalies. 

For example:

Student ID Student Name Subject Name Department
STD121 Ashu  Python Computer Science
STD141 Kapil SQL Electronics
STD347 Rahul Organization Behavior Management
STD121 Ashu Basic of Electronics Computer Science
STD131 Ravi Architecture Basics Civil

As we know, for a table to come under BCNF, it has to satisfy the rules of 3NF first. So, every functional dependency in BCNF, such as A -> B, A, has to be the super key of the table to identify information from other columns. 

In the above table, each student is only from a single department with an ID allotted to each student. Students from the same department with different subjects should be divided into another table in BCNF.

Student ID Student Name Department
STD121 Ashu Computer Science
STD141 Kapil Electronics
STD347 Rahul  Management
STD131 Ravi Civil
Student ID Subject ID
STD121 Subject121
STD141 Subject141
STD347 Subject347
STD131 Subject131

Now the subjects can be identified using the Subject Ids, and there is no dependency of non-prime attributes over other non-prime attributes. By doing this, we have satisfied the Boyce Codd Normal Form rules. 

Frequently Asked Questions  

What are the four 4 types of database normalization?

The four types of normalization of the database are:
1. 1NF (First Normal Form)
2. 2NF (Second Normal Form)
3. 3NF (Third Normal Form)
4. BCNF (Boyce-Codd Normal Form)

What is Normalisation 1NF 2NF 3NF?

1NF: In this form, the repeating groups are eliminated from the table, and the relationship is in 1NF only when it contains an atomic value.
2NF: In Second Normal Form, the partial functional dependency should be removed, resulting in all non-key attributes being fully functional and dependent on the primary key. The relation should also satisfy the rules of 1NF to be in 2NF. 
3NF: To be in Third Normal Form, there should not be any transitive functional dependency in the table. Also, it must satisfy the rules of 2NF before the relationship is in 2NF.

What are the 3 stages of Normalisation?

The 3 stages of normalization of data in the database are First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). In all the stages, the data is selected by keeping in mind that there should not be any anomaly in the data grouping. These anomalies include data redundancy, spurious relations in the data, and loss of data. 

Why are databases Normalised?

Databases are normalized to reduce the redundancy in the data. Normalization is also helpful in preventing issues such as insertion, deletion, or updating the data in the database. It also ensures that only the related data is stored in each table. That’s the main reason for normalizing databases. 

What is the purpose of normalization?

The main goal of normalization is to organize the data in the database in an efficient manner. The other main objectives of the normalization are eliminating redundant data and ensuring the data dependencies in the table. 

Source link

Leave a Reply

Your email address will not be published.

%d bloggers like this: