A family of Microsoft relational database management systems designed for ease of use.
To build a data base composed of correctly normalized tables you design those tables in accordance with 'Normal Forms'. These are a set of rules, governed by formal principles as set out below.
To keep things simple let's look at the first three normal forms. This is not to say that the higher normal forms are unimportant, and where necessary a database should be normalized to as high form as possible, but in most cases a table which is normalized to Third Normal Form will also be normalized to the higher normal forms.
Before looking at the normal forms themselves, let's look at the terms used in the formal definitions:
Loosely speaking, in the language of the relational model, a relvar (relation variable) equates to a table, a tuple to a row (record) and an attribute to a column (field).
First Normal Form: A relvar is in 1NF if and only if, in every legal value of that relvar, every tuple contains exactly one value for each attribute.
This means that in each row in a table there should not be multiple values of an attribute, e.g. in a table of Orders there should not be multiple values in one column for the items ordered, Apples,Bananas,Oranges for instance, or in separate columns named Item1, Item2, Item3. If the table were to have two columns OrderNumber and Item it would be in1NF as each row has one value of each of the two attribyutes.
Each of the two columns is a foreign key referencing the primary keys of Orders and Items tables respectively. In combination the two columns are the key of the table, so can be designated as its composite primary key.
Second Normal Form: A relvar is in second normal form if and only if it is in 1NF and every non-key attribute is irreducibly dependent on the primary key.
This means that, if a table is in 1NF as defined above, each column in it which is not a part of a key must be determined by the whole of the key. Let's assume we have a table or orders with columns OrderNumber, Item and Customer. The table is in 1 NF because in each row we have only one value of each of the three attributes. The tables key, however, is a composite one of OrderNumber and Item as the values in these two columns in combination uniquely identify each row. However, Customer is determined by OrderNumber as each order is by one customer only, so this is not determined by the key, but by only one part of it. If we decompose this table into two tables with columns OrderNumber and Item in one, and OrderNumber and Customer in the other, the two tables would be in 2NF as the first table would be 'all key', i.e. its key is a composite one of its only two columns. In the second there would be only one row per OrderNumber value as this is the key and Customer is determined by its value and nothing else.
Third Normal Form: A relvar is in 3NF if and only if it is in 2NF and every non-key attribute is nontransitively dependent on the primary key.
For this let's assume our second table from the above example has, in addition to the OrderNumber and Customer columns, a CustomerAddress column. We've already seen that Customer is determined by the key, OrderNumber. CustomerAddress is also determined by the key in that, once we know the order number we know the address of the customer placing the order. However, CustomerAddress is also determined by the non-key attribute Customer as, once we know who is the customer we know their address (we are assuming for this example that customers never change their address). So CustomerAddress is transitively determined by the key, OrderNumber, via Customer, so the table is not in 3NF. If we decompose the table as follows:
Orders
….OrderNumber (key)
….Customer
Customers
….Customer (key)
….CustomerAddress
then in each table the non-key column is determined solely by the whole of the key, and each table is consequently in 3NF.
In reality we'd use 'surrogate' numeric keys, so the tables would be:
Orders
….OrderNumber (primary key)
….CustomerID
Customers
….CustomerID (primary key)
….Customer
….CustomerAddress
CustomerID would be an autonumber in Customers, of which it is the primary key, and a long integer number in Orders, in which it is a foreign key referencing the primary key of Customers.
Orders has a many-to-many relationship type with Items, i.e. each order can be for multiple items, and each item can be ordered in multiple orders. A many-to-many table is modelled by a further table which resoles it into two or more one-to-many relationship types, so in this case we'd have the following tables:
Items
….ItemID (primary key)
….Item
….UnitPrice
OrderDetails
….OrderNumber (foreign key)
….ItemID (foreign key)
….UnitPrice
….Quantity
The primary key of OrderDetails is a composite one of the two foreign keys OrderNumber and ItemID. Note that both Items and OrderDetails columns have UnitPrice columns. This might seem to mean that OrderDetails is not in 3NF as it is determined by one part of the key, ItemID rather than solely by the whole of the composite key. This is not the case, however, as prices change over time. The unit price in Items is the current price of the item, and will be updated each time the price changes. The unit price in OrderDetails, on the other hand, is that at the time when the order was made and is obtained from the current price in Items when the row is inserted into the table, but thereafter remains static regardless of any changes to the price of the item. Consequently UnitPrice in OrderDetails is determined by the whole of the key, and the table is in 3NF.
Why does all this matter? Well, if tables are not correctly normalized they will contain redundancies, i.e. each 'fact' might be stated multiple times in different rows. This allows update anomalies where 'facts' in different rows are not consistent with each other, so the data is unreliable.