SOA and database coupling

What do you do if two enterprise services share the same database?

I am running into this all the time.  As we work to break apart legacy applications, we need to recognize that 'stovepipe' applications are written from the perspective of 'put everything we need in the local database.'  That means that we can get really lazy, and require the database to do some things that it does well, like referential integrity, but which complicate integration.

I'll walk through the logic of breaking this apart by way of an example.

Example: if our company (Contoso) stores a list of all customers in Dynamics CRM, but we have an order entry system that is a custom app running in our Extranet.  The app is used by our field sales team to enter customer orders.  Let's say that we successfully work out the problems of getting basic domain data to the extranet.  So we share the same lists of countries, or states/provinces/locales... data that changes rarely.

However, our legacy app was used to not only writing customers to the local database, but also reading customers from the local database.  So we built in operations, like the following:

Joe is a sales rep for Oregon and Idaho.  When Joe logs in to the legacy order entry system, we look up his region and display a list of customers in the region. 

That assumes that we HAVE a list of customers for his region. 

Clearly, the list of customers belongs to another service.  We have written a 'customer lookup' service and a 'get-customer-details' service that connect to Dynamics CRM.  So how, when changing our legacy app, do we break these up... or do we?

There are two design choices to consider here: 

1) How much data from another domain is REQUIRED to maintain local functionality?  (Correlary: is local functionality actually required or helpful?)

2) What is the cost and complexity of acquiring dynamically-changing non-local domain data?

Question 1 is really a requirements in business question, but it leads straight to a technical challenge.  Let's say that the business wants to keep the following function:  Joe logs in and sees customers from his territory. 

Let's say that they also want to add the following capability: Joe can look up a customer from anyone else's territory and either create a local subsidiary or book a sale to the "non local" company (commission rules will apply later).  He CANNOT create a new customer.  (business rule).

So, in order to look up companies local to Joe, we have to know a couple of things:

a) For every salesman, what territories to they have?  [[note: This information is probably also managed in the CRM solution or perhaps another system.  It shouldn't be managed in the order entry system, but it is possible.  Let's say that in the past, it was managed locally, but we want to move it to using the data out of CRM.]]

b) For every customer, what territory are they in?

c) What customers are subsidiaries of other customers?

d) We want to find any customer by name (text search).

e) What customer data needs to appear on the actual order itself?

f) What customer data is needed to allow or disallow specific products, or marketing programs, from the order?

I would suggest that the local data that we need to accomplish this is a subset of customer information.  Not all the customer data is really needed to find a customer.  I would break up the list above into two use cases: find-customer and enter-order.

For find-customer, I'd want local data that includes the names of customers, their customer id, their territory id, a parent customer id (for subsidiaries) and very little else. 

In order to get this information, we could set up an event driven master data management pattern.  When the CRM system updates a customer, it sends an event to an event handler running in Biztalk or some other free-standing component.  (If your CRM system cannot send an event, then have your CRM system update a seperate table in a SQL Server database, and then wire up SQL Server Notification Services to detect the change and send the event... about 200 lines of XML).

Once the event handler gets the notification that data has changed, it passes it along to the subscribers.  In this case, a web service running on top of our order entry system.  That web service asks the CRM system for details about the customer: customer id, territory id, parent id, and customer name.  Once it gets this data, it stores it locally.  Nothing else.  We have our data feed.

When Joe logs in, he can see a list of all customers for his territory because we can look locally for the list.  He can even see other customers, subsidiaries, etc.  Now, if he wants to search for companies by a field that is not in our local database, say number of employees, then our local app would call a service on the CRM system.  That system would return a list containing (you guessed it) customer id, territory id, parent id, and customer name.

When Joe selects a company from the list, the app looks to the CRM system for actual customer details needed for the order header and for rules enforcement.  There is no point in storing this data locally for every customer, although it may be OK to store it locally for this customer now that we have it, along with a 'cache date' so that the local system can use local data when it is not too old, and look up remote data when the local data is 'old enough' (configurable).

What about creating a subsidiary?  The order entry system will need to allow Joe to enter data about the subsidiary, and to pick the parent company from the list of customers (as above).  Then, it can either use a synchronous call to the CRM solution to create the subsidiary first, copying the customer id (created remotely) to the local database, or the app can create the subsidiary locally (using a temporary primary key) and pass an async call to an event handler to add the subsidiary to CRM.  That event handler can then call back to the local system with the official customer id.

The cost of decoupling the databases, and removing a data feed from one system to another is the cost of setting up this Master Data Management mechanism that informs subscriber systems when a publisher system has changed data. 

Note that once you set it up, it can be reused by any number of publishers and subscribers. 

The key here is to get the remotely mastered data out of your local database.  Let the services bring data to you.  Don't go get it with SQL jobs.  This allows you tremendous flexibility.  Services can keep a solid interface without regard to underlying changes.  You can move your CRM data from one CRM system to another without breaking connections to order entry, call center, billing, and many other integrated functions. 

The downstream savings in integration rewrites completely pays to keep these services decoupled and to use events and messages to move data around, as opposed to direct database feeds.

I hope this discussion helps.  Any questions?