Developing a Database Plan
The first step in creating a database is creating a plan that serves both as a guide to be used when implementing the database and as a functional specification for the database after it has been implemented. The complexity and detail of a database design is dictated by the complexity and size of the database application and also the user population.
The nature and complexity of a database application, and also the process of planning it, can vary significantly. A database can be relatively simple and designed for use by a single person, or it can be large and complex and designed, for example, to handle all the banking transactions for thousands of clients. In the first case, the database design may be slightly more than a few notes on some scratch paper. In the latter case, the design may be a formal document hundreds of pages long that contains every possible detail about the database.
In planning the database, regardless of its size and complexity, use the following basic steps:
Gather information.
Identify the objects.
Model the objects.
Identify the types of information for each object.
Identify the relationships between objects.
Gathering Information
Before creating a database, you must have a good understanding of the job the database is expected to perform. If the database is to replace a paper-based or manually performed information system, the existing system will give you most of the information that you need. You should interview everyone involved in the system to determine what they do and what they need from the database. It is also important to identify what they want the new system to do, and also to identify the problems, limitations, and bottlenecks of any existing system. Collect copies of customer statements, inventory lists, management reports, and any other documents that are part of the existing system, because these will be useful to you in designing the database and the interfaces.
Identifying the Objects
During the process of gathering information, you must identify the key objects or entities that will be managed by the database. The object can be a tangible thing, such as a person or a product, or it can be a more intangible item, such as a business transaction, a department in a company, or a payroll period. There are generally a few primary objects, and after these are identified, the related items become visible. Each distinct item in your database should have a corresponding table.
The primary object in the AdventureWorks2008R2 sample database included with SQL Server is a bicycle. The objects related to bicycle within this company's business are the employees who manufacture the bicycle, the vendors that sell components used to manufacture the bicycle, the customers who buy them, and the sales transactions performed with the customers. Each of these objects is a table in the database.
Modeling the Objects
As the objects in the system are identified, you should record them in a way that represents the system visually. You can use your database model as a reference during implementation of the database.
For this purpose, database developers use tools that range in technical complexity from pencils and scratch paper to word processing and spreadsheet programs, and even to software programs created specifically for the job of data modeling for database designs. Whatever tool you decide to use, it is important that you keep it up to date.
Identifying the Types of Information for Each Object
After the primary objects in the database have been identified as candidates for tables, the next step is to identify the types of information that must be stored for each object. These are the columns in the table of the object. The columns in a database table contain a few common types of information:
Raw data columns
These columns store tangible pieces of information, such as names, determined by a source external to the database.
Categorical columns
These columns classify or group the data and store a limited selection of data such as true/false, married/single, and VP/Director/Group Manager.
Identifier columns
These columns provide a mechanism to identify each item stored in the table. These columns frequently have an ID or number in their name, for example, employee_id, invoice_number, and publisher_id. The identifier column is the primary component used by both users and internal database processing for gaining access to a row of data in the table. Sometimes the object has a tangible form of ID used in the table, for example, a social security number, but in most situations you can define the table so that a reliable, artificial ID can be created for the row.
Relational or referential columns
These columns establish a link between information in one table and related information in another table. For example, a table that tracks sales transactions will generally have a link to the customers table so that the complete customer information can be associated with the sales transaction.
Identifying the Relationship Between Objects
One of the strengths of a relational database is the ability to relate or associate information about various items in the database. Isolated types of information can be stored separately, but the database engine can combine data when it is required. Identifying the relationship between objects in the design process requires looking at the tables, determining how they are logically related, and adding relational columns that establish a link from one table to another.
For example, the designer of the AdventureWorks2008R2 database has created tables for products and product models in the database. The Production.Product table contains information for each product that includes an identifier column named ProductID; data columns for the product name, the price of the product, and the product color, size, and weight. The table contains categorical columns, such as Class, or Style, that lets the products be grouped by these types. Each product also has a product model, but that information is stored in another table. Therefore, the Production.Product table has a ProductModelID column to store just the ID of the product model. When a row of data is added for a product, the value for ProductModelID must exist in the Production.ProductModel table.