Data Normalisation
Understanding Data Normalisation
- Data Normalisation is a strategic method for organising data in a database to reduce redundancy and improve data integrity.
- Normalisation involves decomposing a table into two or more tables and linking them using relationships.
- The main aim of normalisation is to minimise data duplication, eliminate data anomalies, protect data integrity, improve data search performance, and increase scalability.
- Data normalisation is carried out in stages known as normal forms (NF). Each NF comes with a set of rules or constraints.
First Normal Form (1NF)
- In the First Normal Form, each column is unique and not duplicated. This means each field contains only atomic (indivisible) values.
- Each entry or record has a unique identifier known as a primary key.
Second Normal Form (2NF)
- The Second Normal Form takes a table that is in 1NF and eliminates partal dependencies. A partial dependency occurs if a non-prime attributes (an attribute that’s not part of the primary key) depends on part of the primary key.
- A table is in 2NF if it’s in 1NF and all its non-prime attributes are fully functionally dependent on the primary key.
Third Normal Form (3NF)
- The Third Normal Form eliminates transitive dependencies. This is when a non-prime attribute depends on another non-prime attribute for its value.
- A table is in 3NF if it’s in 2NF and all its attributes are directly dependent on the primary key.
Boyce-Codd Normal Form (BCNF)
- The Boyce-Codd Normal Form is a high level of normalisation. It addresses a flaw in 3NF where a non-prime attribute determines a prime attribute (this is known as a dependency anomaly).
- A table is in BCNF if for every non-trivial functional dependency X → Y, X is a superkey.
Denormalisation
- Denormalisation is the process of combining tables that have been decomposed during normalisation. It’s used to improve performance by reducing the number of joins needed.
- However, denormalisation can lead to data inconsistency and may increase the physical size of the database due to data redundancy.