This blog is subject the DISCLAIMER below.

Monday, February 12, 2007

Data Normalization

Abstract:
After gathering required information from end-users who are going to use your application, you move to design database and you should spend sometime to design it well because any mistake in your design would cost you a lot of effort, money and time if your application built on it…

Introduction:
Data Normalization is the process of organizing information in your database to should be able to develop professional code to update and retrieve data efficiently and your code comes from a series of rules of standards called NORMAL FORMS

FIRST NORMAL FORM:
Let's execute that code [SELECT * FROM student] so the output is


It's definitely wrong as what about if I want to sort by Student's Last Name I have no way except writing functions to do this job and that's wrong

After modification solution is:
Let's execute that code [SELECT * FROM student] so the output is


So now it's very easy to make all operations I need to sort by any attribute and update and retrieve needed data

Let's execute that code [SELECT * FROM student] so the output is


Unquestionably very wrong to do that for many reasons first of them I can not count all students who take Microsoft SQL Server 2005 course second what about if a student intend to take another course you need to modify the schema or the design of the relation and never ever put yourself in a case according to it you may change your database schema, believe me if you did, you would be fired, assume well to design well…

After modification solution is:


Now I can count all students who take any specific course and also update data smoothly without any modification in database schema

So we can summarize FIRST NORMAL FORM as:

1- Each attribute in a relation should contain only one piece of information that you can care about.
2- And also each piece of in formation that you can care about should be contained in just ONE attribute.

SECOND NORMAL FORM:
It focuses on relation itself and its concepts are:

1- Database must be in FIRST NORMAL FORM.
2- Every attribute should be functionally dependant on the entire primary key.
3- Every relation should model just one entity, object or event.

Let's execute that code [SELECT * FROM student] so the output is



According to SECOND NORMAL FORM concept we see that relation violate SECOND
NORMAL FORM as CourseName dependant on to primary keys they are StudentID and
CourseID

After modification solution is to have two tables one for
StudentCourse and one for Course respectively as shown


Here, we divided first table to two tables to avoid violation of SECOND NORMAL FORM as in last table CourseName became just dependant on CourseID.

THIRD NORMAL FORM:
Let us show what the concepts of the THIRD NORMAL FORM

1- Database must be in SECOND NORMAL FORM.
2- The attributes in addition to be dependant on entire primary key it should not be dependant on another attributes.

3- Don't store any calculated or derived values, only store the raw data and use queries to perform calculations and looks up as needed.

Let's execute that code [SELECT * FROM Item] so the output is


Here, all attributes dependant on ItemID as it's the primary key of this table however the problem is in TotalCost as you see TotalCost is a calculated value result FROM multiplication of ItemCost by ItemQuantity and that's a violation of THIRD NORMAL FORM.

Conclusion:
As we saw applying normalization to our design is very easy and logical but also not applying it will cost us a lot and a lot, and while designing large database you will find some difficulty to achieve the equation that you must attain performance, normalization, efficiency, all end-users requirements and so forth so they are cases you should DENORMALIZE your design i.e. what about if end-user asked you to store calculated or derived values in this case if you didn't find solution you would be forced to say "Ok".

In the article I just mentioned 3 forms of NORMALIZATION and there are other forms like Boyce-Codd normal form – BCNF, Forth normal form - 4NF and fifth normal form - 5NF, may next article will mention them in details

No comments: