(De/)Normalization

On tap

The goal of normalization

Loosely speaking:

[http://rovingcrafters.com/]

Goal: reduce redundancies, anomalies

Higher normal forms → cleaner designs

Normalization is a design step

A construction company db

Employees of the construction company work on projects. Each employee has an ID, name, job title and corresponding hourly rate.

Each project has a number, name and assigned employees. An employee can be assigned to more than one project.

The company bills clients for projects, based on hours worked by employees.

Report

The construction company periodically generates a report like so:

Issues with our db

Here is our table again:

There are numerous issues:

the PROJ_NUM attr could be used as a PK (or part of a PK, along with PROJ_NAME) but it contains nulls
possibilities for data inconsistencies exist, eg. if someone's name or title is misspelled
the redundancies that exist, could lead to insertion anomalies (eg. a new employee needs to be assigned to some project, even a fake one), update anomalies (eg. if an employee's JOB_CLASS changes, it has to be modified multiple times), deletion anomalies (eg. if a project has just one employee and that employee leaves, deleting the lone employee record would lead to the project itself getting deleted!)
data redundancy leads to wasted storage space

We have 'repeating groups' (for each project, we list all details about each employee) - our table is un-normalized, ie. is in '0NF' :)

So, we need to clean up the design!

Objectives: what do we want?

Normal forms

Normalization is a systematic process that yields progressively higher 'normal forms' (NFs) for each entity (table) in our db. We want at least 3NF for each table; in RL, we stop at 3NF.

The process

Normalization how-to, in one sentence: work on one relation (table) at a time: identify dependencies, then 'normalize' - progressively break it down into smaller relations (tables), based on the dependencies we identify in the original relation so that "only the PK, the whole PK and nothing but the PK" acts as a determinant! But how?? Details follow..

Functional dependence, determination

Partial dependency, transitive dependency

If (A,B) is a primary key, we have partial dependence if (A,B)->(C,D) and B->C [C is only partially dependent on the PK, ie. we only need B to determine C]. In other words, a part of an existing PK is acting like a PK on its own.

If X is a primary key, we have a transitive dependency if X->Y and Y->Z [Z is transitively dependent on X, not directly so]. In other words, a non-PK (regular attr) is acting like a PK.

0NF->1NF: eliminate repeating groups

In other words, "fill in the blanks" so that there are no nulls. Now we have a relation (table), with a value in each cell.

Further, identify the PK! In our example, it is (PROJ_NUM,EMP_NUM).

0NF->1NF [cont'd]

Create a dependency diagram, showing relationships (dependencies) between the attributes - this will help us systematically normalize the table.

Dependency diagram

Indicate full dependencies on the top, and partial and transitive dependencies on the bottom. "Top good, bottom bad". Also, color the PK components in a different color (and underline them). Result:

PROJ_NAME has only a partial dependency on the PK (since it is only dependent on PROJ_NUM, which is just a part of the PK).

CHG_HOUR is dependent on JOB_CLASS, which is a non-prime attribute that is itself dependent on EMP_NUM. So JOB_CLASS->CHG_HOUR is a signaling dependency, indicating a EMP_NUM -> CHG_HOUR transitive dependency.

0NF->1NF [cont'd]

1NF->2NF: remove partial dependencies

1NF->2NF [cont'd]

We eliminate partial dependencies by creating separate tables of such dependencies, and removing the dependent attributes from the starter table.

2NF->3NF: remove transitive dependencies

We promote the non-prime keys that masquerade as PKs, into actual PKs (give them their own tables).

Whether we eliminate partial dependencies (to create 2NF) or transitive ones (to create 3NF), we follow the same process: create a new relation for each 'problem' dependency!

2NF->3NF [cont'd]

'Good' tables

We can create a better DB by doing the following augmentations, to the 3NF model we just created:

evaluate PKs - create a JOB_CODE
evaluate naming conventions - eg. JOB_CHG_HOUR
refine attr atomicity, eg. EMP_NAME
identify new attrs, eg. EMP_HIREDATE
identify new relationships, PROJECT can have EMP_NUM as FK [to be able to record a project's (always sole) manager]
refine PKs for data granularity, eg. ASSIGN_NUM
maintain historical accuracy [duplicate data], eg. store JOB_CHG_HOUR in ASSIGNMENT
evaluate derived attrs, eg. ASSIGN_CHARGE

Final result

Here is the result of making the "extra" changes to our 3NF form:

Final result [cont'd]

Normalization: summary

* 1NF: eliminate repeating groups (partial:y, transitive:y)

* 2NF: eliminate redundant data (partial:n, transitive:y)

* 3NF: eliminate fields not dependent on key fields (partial:n, transitive:n)

Here is more, on normalization.

(De/)Normalization

On tap

The goal of normalization

Goal: reduce redundancies, anomalies

Higher normal forms → cleaner designs

Normalization is a design step

A construction company db

Report

Issues with our db

Objectives: what do we want?

Normal forms

The process

Functional dependence, determination

Partial dependency, transitive dependency

0NF->1NF: eliminate repeating groups

0NF->1NF [cont'd]

Dependency diagram

0NF->1NF [cont'd]

1NF->2NF: remove partial dependencies

1NF->2NF [cont'd]

2NF->3NF: remove transitive dependencies

2NF->3NF [cont'd]

'Good' tables

Final result

Final result [cont'd]

Final result [cont'd]

Final result [cont'd]

Normalization: summary

Denormalization

Denormalization [cont'd]