Lesson 4: Designing the Data Flow Architecture

Estimated lesson time: 10 minutes

When you design a database, consider data flow. Prepare your data model and document possible data flow before you start the physical implementation of any databases. Data flow includes flow from the database to the middle tier and clients, flow to copies of data, and flow for archiving purposes.

Data Flow for OLTP Applications

In OLTP applications, the application typically uses data-access components such as ADO.NET to maintain the flow of data from a database to the middle tier or clients. However, as a data modeler, you must be prepared for multiple copies of data that can exist at the same time. For example, an application can implement a disconnected model in which it brings data to the client and then disconnects, only to reconnect later to make changes to that data. However, in the meantime, another user might have accessed that same data, making his or her changes. With multiple copies of data, you can expect update conflicts. Thus, you need to have a solution for resolving conflicts when the same piece of data is updated independently in multiple places.

One solution is to store a copy of the data in application cache. For example, you can use the ADO.NET DataSet object and persist part of the data about a client. An application can cache the data in other objects, such as any of the .NET collections, as well. An application is responsible for resolving update conflicts that stem from its own cache. However, a distributed application could need a copy of a database stored in relational format in a RDBMS on a disconnected client. Merge replication is the mechanism in SQL Server for such needs. Merge replication comes with plenty of built-in conflict-resolving procedures, and you can add your own. In any case, you must document the resolution process thoroughly, including details about the resolving algorithm. In addition, if you use merge replication, you should have a plan for merging data. You perform merging between a single subscriber and a publisher at a time. With hundreds of subscribers, you could get in trouble with your time window for merging if you do not give your merge process plenty of time. Remember that in the same time window, you usually have to accomplish other maintenance tasks, such as backups, as well.

Many OLTP applications perform intensive work on a small part of the data only. Stale data is more or less read-only, used just for queries. This old data has a different maintenance plan; for example, you do not need to check the integrity of your database with DBCC CHECKDB on old data as often as on current data, which undergoes heavy modification activity. You can speed up maintenance of the current data by moving the old data to an archive. You can implement the archive in multiple ways:

  • An application can perform the archiving at the end of a period—such as at the end of a fiscal year.
  • You can use a data warehouse.
  • You can create archive tables in a different filegroup of the OLTP database or even in a different database.
  • In SQL Server 2005, you can partition a table and create archive partitions in separate filegroups. You will learn more about table partitioning in Chapter 3, “Designing a Physical Database.”

Data Flow for Business Intelligence Applications

Data flow for OLTP applications typically does not involve transformations. For BI applications, just copying the data makes less sense. It is more valuable to do some data preparation in advance so that the data can serve better for analysis. For example, in a data warehouse, you use a multidimensional data model that includes merged and cleansed data. In addition, even simple reports on the same structure as it is in the OLTP database can benefit from copies of data. With copies of data, you can improve performance because you implement load balancing and diminish locking contention. Plan the data flow for BI applications in advance. Again, for extracting, transforming, and loading (ETL) data, you typically have a limited time window during off-peak hours.

SQL Server 2005 comes with many methods for data distribution and transformation that you can use for BI applications.

  • Snapshot isolation With this isolation level, SQL Server uses the tempdb system database to maintain the version of a row just before it is updated. SQL Server drops the old rows immediately after no connection uses them anymore. With snapshot-isolation levels, readers do not block writers, so you diminish locking contention. However, you have to devote special attention to tempdb.
  • Database snapshots You can use these read-only snapshots of a source database as the source for reports, reducing locking contention. You must create and drop database snapshots, preferably with the help of SQL Server Agent scheduled jobs.
  • Transactional replication This type of replication is especially useful when you need a copy of the data in a different server with low latency. Although you can improve the replication with your own procedures that can do some data transformation, replication is more suited for maintaining copies of data.
  • SQL Server Reporting Services Reporting Services can keep its own snapshots of data. You can prepare these snapshots in advance, before end users start to execute reports. Note that you must consider Reporting Services security; it has its own security model.
  • SQL Server Integration Services This tool is especially useful for complex transformations and for merging data from multiple sources. SQL Server Integration Services (SSIS) is a perfect tool for maintaining data warehouses. However, you cannot achieve low latency with SSIS; typically, you load a data warehouse in off-peak hours.
  • SQL Server Analysis Services Analysis Services can pull data directly from the source database. For OLAP applications, you typically have to transform and merge data; therefore, in most cases, you would choose to create a data warehouse and use SSIS for the ETL process.
  • Additional tools SQL Server 2005 provides other tools you can use for data flow to BI applications, even though they are not primarily designed for that purpose. For example, to create a copy of data for BI applications, you could use the Copy Database Wizard, snapshot replication, backup and restore, log shipping, and detaching and attaching a database.

No matter which method you select, you must plan your data flow during the data-modeling phase of your project and document it thoroughly.

Quick Check

  1. Which is the best method for maintaining data flow?
  2. What problems can you expect when you have multiple copies of updateable data at a time?

Quick Check Answers

  1. There is no simple answer to this question. When you investigate which method would be best for your environment, consider your latency requirements, your need for transformations, security issues, tool availability, existing knowledge in your company, transactional consistency, and more.
  2. You should be prepared for update conflicts.

< Back      Next >

 

 

© Microsoft. All Rights Reserved.