Share via


Monitor Data Exchange between DQS and Reference Data Service Providers

One of the key features in Data Quality Services (DQS) is to use knowledge from a cloud-based reference data service (Windows Azure Marketplace) to cleanse your data. However, in Data Quality Client, you cannot monitor information about the data exchanged between DQS and the reference data service providers during the cleansing process. This article provides information about why and how you can view/monitor this information.

Why to monitor data exchange?

Well, primarily for the following two reasons:

  • Financial auditing: You are charged/billed for data exchange between DQS and reference data service providers in the cloud. It is good to have the information handy in case you find any discrepancy in your billing, and want to verify actual usage of the reference data service.
  • Troubleshooting: Provides various information about the data exchange such as total time taken for the cleansing of a batch of records, how many records were cleansed, response from the reference data service provider (HTTP status), and so on. This information enables you to efficiently track issues with reference data-based cleansing, and quickly answer support calls from customer.

How to monitor data exchange?

The V_A_REFERENCE_DATA_AUDIT view in the DQS_MAIN database contains information about the data exchange between DQS and reference data service. Each batch data request is stored as a row in the view. This V_A_REFERENCE_DATA_AUDIT view contains the following columns:

  • ID: Unique ID of the data exchange record in this view.
  • Provider Name: Reference data service provider name used for cleansing.
  • Project: Data Quality Project name used for cleansing.
  • Knowledge Base: DQS Knowledge base used for cleansing.
  • Batch ID: Unique ID of the data batch sent to the reference data provider.

IMPORTANT: When a domain value is successfully queried once from the reference data service provider, it is cached locally to prevent the same domain values from being queried multiple times from the reference data service provider. You are not billed for queries against the cached data. In this case, ##Cache Lookup## is displayed in the Batch ID column instead of a GUID to signify that the data has been queried against the local cache.


  • Number of records sent: Number of records sent to the reference data provider.
  • Number of records received: Number of records received from the reference data provider.
  • Time sent: Time when the batch request was sent to the reference data provider for data cleansing. (Note: this time stays constant across multiple batches within the same activity, because there is a bug where the time is not refreshed for each individual batch in SQL Server 2012 RTM)
  • Time received: Time when the reference date service responded with the cleansed data.
  • Duration: Time taken by the reference data provider to respond with the cleansed data. This is calculated as the difference between the request and response time. (Note: this duration is cumulative is since the beginning of the activity, since the Time sent column is not incremented per individual batch in SQL Server 2012 RTM)
  • Response Code: The HTTP status code for the operation. For example, 200 OK, 504 - Gateway Timeout, and so on.

To view data in the V_A_REFERENCE_DATA_AUDIT view in the DQS_MAIN database:

  1. Start Microsoft SQL Server Management Studio
  2. In Microsoft SQL Server Management Studio, right-click the SQL Server instance where DQS is installed, and then click New Query in the shortcut menu.
  3. In the Query Editor window, copy the following SQL statement:
    select * from DQS_MAIN.dbo.V_A_REFERENCE_DATA_AUDIT
  4. Press F5 to run the statement. The Results pane will display the data in the V_A_REFERENCE_DATA_AUDIT view.

Example Output:

ID

PROVIDER NAME

PROJECT

KNOWLEDGEBASE

BATCH ID

NUMBER OF RECORDS SENT

NUMBER OF RECORDS RECEIVED

TIME SENT

TIME RECEIVED

DURATION

RESPONSE CODE

1000000

Provider 1

Project1

KB1

##Cache Lookup##

38

0

2011-12-12 14:49:40.973

2011-12-12 14:49:41.063

00:00:00.0900000

OK

1000001

Provider 1

Project1

KB1

354aae07-bb1e-47ea-a9f8-43afdcbacefb

38

0

2011-12-12 14:49:41.127

2011-12-12 14:50:27.067

00:00:45.9400000

504 - Gateway Timeout

1000002

Provider 1

Project1

KB1

191c6a4e-af18-4117-a273-53d9e80988c8

20

20

2011-12-12 14:50:27.097

2011-12-12 14:50:35.153

00:00:08.0570000

OK

1000003

Provider 2

Project2

KB2

##Cache Lookup##

288

0

2012-02-07 11:38:40.163

2012-02-07 11:38:40.560

00:00:00.3970000

OK

1000004

Provider 2

Project2

KB2

82db1c29-b77b-4452-a837-d4e1744b655e

100

100

2012-02-07 11:38:40.630

2012-02-07 11:38:46.057

00:00:05.4270000

OK

1000005

Provider 2

Project2

KB2

5d0e526d-56db-47bc-bebe-caaa43123665

100

100

2012-02-07 11:38:40.630

2012-02-07 11:38:50.447

00:00:09.8170000

OK

1000006

Provider 2

Project2

KB2

847ba394-92aa-4a55-b882-940834838d69

88

88

2012-02-07 11:38:40.630

2012-02-07 11:38:54.453

00:00:13.8230000

OK

If you have enabled logging for reference data service (RDS) in Data Quality Client (see Configure Severity Levels for DQS Log Files), each batch request is also logged in the DQS Server log file (DQServerLog.DQS_MAIN.log). A sample entry in the DQS Server log file looks like the following:

12/14/2011 9:44:55 AM|[]|15|INFO|f592c205-3a81-40d3-8fe8-08a511df6762|Microsoft.Ssdqs.ReferenceData.Auditing.ReferenceDataAuditor| Batch Job: PROVIDER [1002], PROJECT [1000078], PROCESS [1079], BATCH ID [f6388357-a209-413f-b2df-499c3ea8b12e], NUMBER OF RECORDS SENT [10], NUMBER OF RECORDS RECEIVED [10], TIME SENT = [12/14/2011 09:44:03], TIME RECEIVED [12/14/2011 09:44:55], DURATION [00:00:52.2082203], RESPONSE CODE [OK]


See Also