Share via


A distributed systems' logical data model

There's lots of different ways to describe data.  I've seen data models that attempt to describe, conceptually, all of the data relationships for lines of business, marketing programs, fulfillment programs, etc.  Conceptual data models are useful, primarily because they give you a starting point to work with the business to first understand, and then communicate, how the data can represent the business' requirements.

Normally, when creating a system, we drop down to a logical data model for that system.  We indicate the "data on the inside" and the "data on the outside".  Effectively, the diagram starts with a large 'box'.  Inside the box are entities needed by the application.  Outside are the entities that come from somewhere else but are referenced by the application.

One challenge, however, that appears to be stumping one of my team mates is how to create the conceptual model when there is not one system, but two or three systems that communicate.  Effectively, we are talking about a distributed system, with distributed data.  The data is not distributed because of geography, but rather in order to foster loose coupling.

This is a different way to look at the design of a system than is typically seen, but I feel pretty strongly that it is an important aspect, and one that we need to be fairly formal about.

I approach the systems from the standpoint of the business processes first, and the use cases second.  For example, if you are creating a system that facilitates the creation of a standard business contract, it is entirely reasonable to break down the process into steps, where each step is performed by different roles. 

First step would be to define a marketing or fulfillment program that the contract will be tied to.  Second would be to create legal clauses that can be fit into the document.  Third would be to create a template with rules for how the clauses are to be assembled for the particular contract type, and fourth would be to create the contract itself.  Different people perform each step.  Each step has distinct responsibilities.  You could, if you wish, create a seperate system for each.  In a SOA world, I think that you would create a set of services for each.

Each set of services is, in itself, an independent system.  In order to remain decoupled, the data may be referential, but not coupled.  Therefore, you may need to add a customer before you add an invoice, but there is NO reason that adding a customer should create data records directly in the order management database (I'm being a purist... Master Data Management is the 'reality' behind this situation).

So, if you are a developer who is used to creating a database with every bit of data that you think you will need in it, it can be quite a change to create not one, but many databases, bound together by master data that is copied locally on demand, and kept up to date by a cache engine (MDM).

Now, take one of those developers and ask him or her to create a data model that illustrates not "data on the inside" but "data in each room".  That requires a different kind of thinking... because now, the problem of 'master data' becomes visible (and a little painful).

In this model, the Product data is brought across both for invoices and for shipments, but is it really the Product data that is in the shipments, or is it product and lot data.  In other words, it is one thing to ask "who did we ship soap to," and another thing altogether to ask "who did we ship Lot 41 of tainted beef to?"

This distinction, between product and lot, becomes particular visible when you model your systems this way, but more importantly, you can see the lines that cross the boundary between systems, and you can place services on each line: get product, get lot, get invoice, get shipment

When designing the database, you will need to use a replication or cache or transactional store to insure referential integrity.