SharePoint Data Models in Broader Context
Before you look at the specifics of working with data in SharePoint 2010, it can be instructive to consider the SharePoint platform in the broader context of data modeling and data storage. This topic discusses common high-level aspects of data modeling in relation to SharePoint 2010.
Green Field and Brown Field Scenarios
When it comes to designing data-driven applications, solution architects can find themselves in one of two situations. In one situation, they don't have to worry about anything that exists already and they can start the design process with a blank piece of paper. In the other situation, they have to design the solution around an existing environment, which could include legacy systems and different types of data repository.
Most architects would agree that not having to deal with legacy systems and preexisting data stores is more fun. This situation is known as green field development. You can design the application the way you want it without working around constraints imposed by other systems.
On the other hand, brown field development describes scenarios in which you are enhancing an operational system, or integrating with existing systems, such as line-of-business applications, proprietary software, and legacy data stores. As the data management requirements of most organizations evolve continuously over time, you're generally more likely to encounter brown field development scenarios.
Whether your application needs to integrate with other systems and data stores will clearly have a major bearing on the design process. SharePoint 2010 excels as a platform for brown field development scenarios because of its ability to connect to and interact with almost any kind of external system. Although most guidance necessarily ignores the sticky details around integrating existing systems into a solution design, this topic devotes much of its content to the opportunities for integrating with external systems and working with external data.
Structured vs. Unstructured Data
When it comes to data models for the SharePoint platform, the discussion often leads quickly to structured and unstructured data. Structured data is typically highly organized and tabular, such as the information in a SharePoint list or a database table. Unstructured data contains information but lacks a rigorous schema, such as the content of a document or an image. For example, the content of this topic is unstructured data. It contains information, but the information doesn't fall into neat categories of rows and columns. Conversely, structured tables may contain columns for notes, videos, or other unstructured information.
At a basic level, you typically use a SharePoint list to store structured data and a SharePoint document library to store unstructured data. However, one of the key benefits of the SharePoint platform is that it allows users to associate some structured data with fundamentally unstructured content. For example, a document will typically have a title and an author, as well as purpose-specific information such as the name of a customer, the due date for an invoice, or the final effective date for a contract. You can add these fields to a SharePoint document library in the same way that you would add them to a SharePoint list, and the SharePoint platform includes features such as property promotion that enable you to automatically extract this information from the document in many cases.
This blurring between traditional definitions of structured and unstructured data occurs because conceptually, a document library is simply a list where each item has one—and only one—attachment. SharePoint document libraries include all the same functionality as SharePoint lists, such as the ability to add columns and create different views. This allows users to sort, query, and filter libraries of files and documents in the same way that they would work with regular structured data.
Database Models vs. SharePoint Data Models
A data model describes the real-world pieces of information that your application will work with, together with the relationship between these pieces of information. In the case of a relational data model, this is often represented as an entity-relationship diagram. The following illustration shows an example of an entity-relationship diagram for a machine parts inventory database.
Entity-relationship diagram for machine parts inventory database
Today, most developers are familiar with relational databases. In a relational database, the data model is realized using the following constructs:
- A database contains one or more tables. Typically, each table models a logical entity, such as a person, an organization, or a manufactured item. For example, the machine parts inventory database contains tables for machines, parts, suppliers, and other entities.
- Each table contains one or more columns, or fields. Each column represents a single item of information about the entity modeled by the table. For example, the table named Machines includes fields named Name, ModelNumber, and ManufacturerId. Each field has a specific type, such as a string or an integer.
- A table row represents a single entry in the table. For example, the Machines table will include a single row for each machine defined in the table.
- Each table includes a primary key. This is a field value, or a combination of field values, that uniquely identifies each entry in the table.
- You can create relationships between tables by linking the primary key of one table to the same field (the foreign key) in another table. This is known as a foreign key relationship. For example, the Id field is the primary key for the Machines table, while MachineId represents the same field in the MachineDepartment table. As such, a foreign key relationship can be defined between the two tables.
Most database engines, such as Microsoft SQL Server, also allow tables to execute some programming logic when certain events, known as triggers, occur. You might invoke your logic when a row is added, updated, or removed from the table. You can use trigger logic to enforce rules that ensure consistency in the data or to drive updates to applications such as cache refreshes. Typically, database engines use a data definition language (DDL) to represent the data model internally. In Microsoft SQL Server, the DDL is a subset of SQL statements that are used to create the tables and relationships. The database engine stores DDL metadata that describes these structures in a system database. When you query a database, the database engine uses this metadata to determine how to interact with the database in question.
SharePoint allows you to construct data models using constructs that are conceptually similar to those found in a SQL database:
- SharePoint lists (and by association, document libraries) are conceptually similar to database tables.
- SharePoint columns are conceptually similar to database table columns.
- SharePoint content types provide an additional layer of abstraction to support reuse and can be compared to the schema for a database table.
- You can create relationships between SharePoint lists using lookup columns that are conceptually similar to the foreign key relationships between database tables.
However, the way in which the data model is stored and implemented differs substantially between SharePoint and a SQL database. Although SharePoint uses SQL Server as its underlying data store, it introduces a level of abstraction between the data structures you define and the data store. One key advantage of this additional abstraction is that SharePoint users with sufficient permissions can define and manage their own data structures without the intervention of a database administrator. SharePoint stores the metadata that defines columns, lists, and content types in its content databases, in much the same way that the SQL Server database engine stores data model metadata in its system databases.