Data Patterns

patterns & practices Developer Center

Retired Content

This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Version 1.0.0

Complete List of patterns & practices

"...since my intention is to write something useful for anyone who understands it, it seemed more suitable to me to search after the effectual truth of the matter, rather than its imagined one." - Niccolo Machiavelli in The Prince, 1532

Although managing data gets less fanfare than other IT disciplines, it is crucial to the well-being of enterprise systems. The architecture, design, and implementation of data management systems are also very complex. The goal of data patterns is to directly address this complexity, and provide solutions to common problems, often using relatively simple mechanisms. These patterns are based on "the effectual truth," as Machiavelli called it, which means that they are based on approaches to solving the problems that have proven successful.

Data professionals have been working with data patterns for many years, but they have probably not explicitly recognized this. Until now, very few data patterns have been formally captured and shared with a wider community. Instead, they continue to be held within organizations as tacit knowledge, or expressed in the form of internal standards or guidelines.

These patterns are about the problems faced by those who build the data services in an enterprise class business solution. They address the need to create the database designs and the data services that exist invisibly to the applications that use the data; in other words, the data and services that exist within the data ecosystem.

Patterns are useful to data professionals because they:

Document simple mechanisms that work.

Provide a common vocabulary and taxonomy for developers and architects.

Enable solutions to be described concisely as combinations of patterns.

Enable reuse of architecture, design, and implementation decisions.

The rest of this chapter introduces the notion of data patterns, explains how a pattern documents simple, proven mechanisms, and shows how collections of patterns provide a common language for developers and architects. To illustrate these concepts, this chapter applies abbreviated versions of actual patterns to real-life data situations.

Patterns Document Simple Mechanisms

A pattern describes a recurring problem that occurs in a given context and, based on a set of guiding forces, recommends a solution. The solution is usually a simple mechanism, a collaboration between two or more data objects, services, processes, threads, components, or nodes that work together to resolve the problem identified in the pattern.

Note: Although the underlying mechanisms described in these patterns are conceptually simple, in practice their implementation can become quite complex. The implementation requires skill and judgment to tailor general patterns to fit specific circumstances. In addition, the pattern examples in this chapter are highly abbreviated for the purpose of introduction; the actual patterns in subsequent chapters are much more detailed.

Consider the following example:

You are building a laptop solution which contains an application that salespeople use to give customers quotations for orders (OrderQuote). It is important that the applications can work in a disconnected environment. The salespeople, therefore, require local data services on their laptops (for example, CustomerDetails, Orders, Price, Products, and BusinessRules tables). It is important that the quotations are as accurate as possible. It is also important that a quote given by any particular laptop application is consistent with one that another laptop would produce within a defined period of time. How do you structure your design so that your local data is sufficiently current and any work done is consistent within a defined time period no matter which laptop it is performed on?

A simple solution to the OrderQuote problem is to create a parameterized data replication service that copies only the required data to a particular laptop on a periodic basis. The parameters identify the data requirements of a particular salesperson. The data is copied from a shared database on a server. In cases where the same data is required on more than one laptop, a master copy of that data is taken at a point in time and then it is copied to all laptops to ensure consistency of application results.

It is likely that you have solved problems like this in a similar manner, as many other designers have. If you have, you were providing data copies in a manner that this guide identifies as the Master-Subordinate Snapshot Replication design pattern.

Patterns as Problem-Solution Pairs

The Master-Subordinate Snapshot Replication pattern does not mention an OrderQuote process, or CustomerDetails tables. Instead, the pattern looks something like the following abbreviated example.

Click here to see larger image

Figure 1: Master-Subordinate Snapshot Replication pattern, abbreviated

Comparing the abbreviated pattern example in Figure 1 with the solution outlined above illustrates the difference between the pattern, which is a generalized problem-solution pair, and the application of the pattern, which is a very specific solution to a very specific problem. The solution, at a pattern level, is a simple, yet elegant, collaboration between data stores. The general collaboration in the pattern applies specifically to a data replication service, which provides the mechanism that controls the copying of the data. Clearly, you can apply the same pattern to countless situations by modifying the pattern slightly to suit specific local requirements.

Written patterns provide an effective way to document such simple and proven mechanisms. Patterns are written in a specific format, which is useful as a container for complex ideas. As Figure 1 shows, a pattern is defined as a three-part relationship between a general problem, its context, and its solution, which is based on real-world experience, and is documented in a consistent, formal structure.

Although pattern writers usually provide implementation examples within these generalized patterns, it is important to understand that there are many other correct ways to implement these patterns. The key here is to understand the guidance within the pattern and then customize it to your particular situation. For example, the implementation examples provided in this guide are based on Microsoft SQL Server. If you need to implement the pattern using a different product, you can do so. However, an implementation that is optimized for another database management system might look quite different, and while these two implementations could differ significantly, both would be correct.

Patterns at Different Levels

Patterns exist at many different levels of abstraction. Consider another example, this time at a higher level of abstraction than design:

You are architecting a common approach to be the basis for how you move copies of data around in your organization. You have to deal with data that is held on many different platforms, is structured in different schemas, has policies and constraints on its relationships, has differing security requirements, has different application uses, and has different operational characteristics. How do you organize your data copying at a high level to be flexible, loosely coupled, and yet sufficiently cohesive?

The Move Copy of Data architecture pattern describes a solution to this problem, which involves using one fundamental architectural building block to solve the problem. The block reflects that fundamentally, the solution always consists of a source data store that contains the data to be copied and moved; a link across which it moves and which contains the same three basic services; and a target data store where the copy is to be held. The block is expressed as a pair-wise solution, but it can be applied, fractal-like or in a network-structure, to solve data copy problems of varying complexity. When you do this, you need to maintain discipline about the knowledge of the resulting copy infrastructure so you can understand the provenance of copied data and the impact of changing parts of it. Again the common approach helps you to solve this problem, but it does not do it for you.

Ff648502.DP_Chp_01_DataPatterns_Fig02(en-us,PandP.10).gif

Figure 2: Data Movement Building Block

If you always architect data movement systems this way, then you employ this pattern already. Even so, there are many reasons why you might want to understand the patterns that underpin this architectural approach. You may be curious about why systems frequently are built this way, or you may be looking for more optimal approaches to problems that this pattern does not quite resolve. In either case, it is worth examining the patterns and mechanisms at work here.

The reason that this approach is so commonly used is that it deals with complexity well by using a layered approach to dividing up the problem. In this case, the layers are instances of the source-target pairings (where the pairings are not constrained to 1:1 relationships). This simple strategy of organizing to manage complexity helps to solve two challenges: the management of dependencies and the need for exchangeable parts. Building environments without a well-considered strategy for dependency management leads to brittle and fragile solutions, which are difficult and expensive to maintain, extend, and substitute. Enterprise Solution Patterns Using Microsoft .NET contains an architectural pattern called Layered Application, which contains a more detailed explanation of the benefits of a layered approach.

Simple Refinement

As you will see in later chapters, Master-Subordinate Snapshot Replication is a refinement of Master-Subordinate Replication, which is in turn a refinement of Move Copy of Data. This means that the context, forces, and solution identified in Master-Subordinate Replication still apply to Master-Subordinate Snapshot Replication, but not the other way around. That is, the Master-Subordinate Replication pattern constrains Master-Subordinate Snapshot Replication, and the Master-Subordinate Snapshot Replication pattern refines the Master-Subordinate Replication pattern. This pattern relationship is useful to manage complexity. After you understand one pattern, you must only understand the incremental differences between the initial pattern and patterns that refine it. Another example should help to illustrate the concept of refinement:

The laptop application that you built has been very successful and its use is expanding. Also, the company is extending it products and services. Now you want to copy data to more laptops and the amount of data required is larger. Currently, you deliver all the data that the laptop application needs every time you replicate. Continuing with your present strategy would put an unacceptable load on the infrastructure. How do you provide the data copies to the expanded customer base within the constraints if your infrastructure?

One solution to this problem is to extend Master-Subordinate Replication by adding an additional capability: the ability to deliver only the changes that have occurred at the server to the copied data since the last replication to the target. Any unchanged data is not copied again. If the percentage of data that is changed between replications is relatively low, this solution works well. One solution for this is captured in Master-Subordinate Transactional Incremental Replication.

Notice the relationships between these patterns (see Figure 3). Move Copy of Data introduces a fundamental strategy for moving copies of data. Data Replicationand Master-Subordinate Snapshot Replication progressively refine this idea and constrain it to a certain type of one-way replication that replaces all of the data at the target. Master-Subordinate Transactional Incremental Replication refines Master-Subordinate Replication to different type where only data changes are copied.

Ff648502.DP_Chp_01_DataPatterns_Fig03(en-us,PandP.10).gif

Figure 3: Refinement of related patterns

Adding functions to specific layers is not the only way to manage this growing complexity. As complexity warrants, designers often create additional layers to handle this responsibility. For example, some designers would instead choose to adopt a layered approach to the infrastructure problem by adding intermediary copy stores into the infrastructure. This allows the data to be copied out in waves first to the intermediary stores and then to the set of laptops serviced by each intermediary. This solution is described in the Master-Subordinate Cascading Replication pattern.

When grouped together, these variations form part of a cluster of patterns (see Figure 4) that visually represents common approaches to copying data. Clustering, used in this context, simply means a logical grouping of some set of similar patterns and their relationships. Usually the relationship is one of refinement, as shown above. Other relationships can be added, however. This guide adds a relaxed relationship, which means "can use." So Master-Master Replication can use Master-Subordinate Snapshot Replication, but there is no refinement between the patterns. This notion of a cluster is quite useful for expanding the view of patterns to encompass an entire solution, and for identifying clusters of patterns that address similar concerns in the solution space. Chapter 2, "Organizing Patterns," discusses clusters in more detail.

Click here to see larger image

Figure 4: A cluster of data patterns

Common Vocabulary

While considering the Move Copy of Data, Data Replication, Master-Subordinate Replication, Master-Subordinate Snapshot Replication, Master-Subordinate Transactional Incremental Replication, and Master-Subordinate Cascading Replication patterns, you probably noticed that patterns also provide a powerful vocabulary for communicating software architecture and design ideas. Understanding a pattern not only communicates the knowledge and experience embedded within the pattern, but also provides a unique, and hopefully evocative, name that serves as shorthand for evaluating and describing software design choices.

For example, when designing a data copy environment, a developer might say, "I think the pricing information should be copied using Master-Subordinate Snapshot Replication and deployed using Master-Subordinate Cascading Replication." If another developer understands these patterns, he or she would have a very detailed idea of the design implications under discussion. If the developer did not understand the patterns, he or she could look them up in a catalog and learn the mechanisms, and perhaps even learn some additional patterns along the way.

Patterns have a natural taxonomy. If you look at enough patterns and their relationships, you begin to see sets of ordered groups and categories at different levels of abstraction. Chapter 2 further expands and refines this taxonomy.

Over time, developers discover and describe new patterns, thus extending the community body of knowledge in this area. In addition, as you start to understand patterns and the relationships between patterns, you can describe entire solutions in terms of patterns.

Concise Solution Description

In this guide, the term solution has two very distinct meanings: first, to indicate part of a pattern itself, as in a problem-solution pair contained within a context; second, to indicate a business solution. When the term business solution is used, it refers to a software-intensive data processing system that is designed to meet a specific set of functional and operational business requirements. A software-intensive data processing system implies that you are not just concerned with software and data; you must deploy this software and data onto hardware processing nodes to provide a holistic technology solution. Further, the software under consideration includes both custom-developed software and purchased software infrastructure and platform components, both of which have data needs and all of which you integrate together.

Summary

This chapter introduced the concept of a pattern, explained how patterns document simple, proven mechanisms, and showed how patterns provide a common language for designers and architects. Chapter 2 explains how to organize your thinking about patterns, and how to use patterns to describe entire solutions concisely.

patterns & practices Developer Center

Retired Content

This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.