Data Normalisation – A Level Software Systems Development CCEA Revision

Understanding Data Normalisation

Data Normalisation is a strategic method for organising data in a database to reduce redundancy and improve data integrity.
Normalisation involves decomposing a table into two or more tables and linking them using relationships.
The main aim of normalisation is to minimise data duplication, eliminate data anomalies, protect data integrity, improve data search performance, and increase scalability.
Data normalisation is carried out in stages known as normal forms (NF). Each NF comes with a set of rules or constraints.

First Normal Form (1NF)

In the First Normal Form, each column is unique and not duplicated. This means each field contains only atomic (indivisible) values.
Each entry or record has a unique identifier known as a primary key.

Second Normal Form (2NF)

The Second Normal Form takes a table that is in 1NF and eliminates partal dependencies. A partial dependency occurs if a non-prime attributes (an attribute that’s not part of the primary key) depends on part of the primary key.
A table is in 2NF if it’s in 1NF and all its non-prime attributes are fully functionally dependent on the primary key.

Third Normal Form (3NF)

The Third Normal Form eliminates transitive dependencies. This is when a non-prime attribute depends on another non-prime attribute for its value.
A table is in 3NF if it’s in 2NF and all its attributes are directly dependent on the primary key.

Boyce-Codd Normal Form (BCNF)

The Boyce-Codd Normal Form is a high level of normalisation. It addresses a flaw in 3NF where a non-prime attribute determines a prime attribute (this is known as a dependency anomaly).
A table is in BCNF if for every non-trivial functional dependency X → Y, X is a superkey.

Denormalisation

Denormalisation is the process of combining tables that have been decomposed during normalisation. It’s used to improve performance by reducing the number of joins needed.
However, denormalisation can lead to data inconsistency and may increase the physical size of the database due to data redundancy.