Data Collector Architecture and Processing
The data collector is integrated with SQL Server Agent and Integration Services, and uses both extensively. Before you work with the data collector, you should therefore understand certain concepts related to each of these SQL Server components.
SQL Server Agent is used to schedule and run collection jobs. You should understand the following concepts:
Job
Job step
Job schedule
Subsystem
Proxy accounts
For more information, see Automating Administrative Tasks (SQL Server Agent).
Integration Services (SSIS) is used to execute packages that collect data from individual data providers. You should be familiar with the following SSIS tools and concepts:
SSIS package
SSIS package configuration
For more information, see Integration Services Packages.
Data Collector Architecture
The following illustration shows the architectural model for data collection and storage.
As shown in the illustration, the components in the model are grouped into the following categories:
Storage components are the databases that contain configuration information and the collected data.
Execution components are used for data collection and storage.
API components are used to enable interaction between the user interfaces and the data collector.
Client components are the user interfaces for the data collector.
The following sections provide more detail about these categories and components.
Storage
The data collector stores data in two places: the msdb database and the management data warehouse.
Note
The data collector implements user-schema separation, which was introduced in SQL Server 2005. For more information, see User-Schema Separation.
msdb System Database
The data collector uses the msdb database for storing configuration information, run-time information, auditing, and collection history information. This database must be present on the instance of SQL Server that runs the data collector. Using msdb provides the following benefits:
All the data needed to configure and run the data collector is in one location.
Data collection configuration can be deployed across several servers without having to use the file system.
The data collector can use existing SQL Server security mechanisms to protect the data. In addition, database roles can provide granular security, and there is no need to implement cross-database chaining.
Since msdb is a relational database, it is possible to ensure the referential integrity of the configuration and run-time data.
In addition to storing collector-specific information, msdb is also used to store SQL Server Agent job information and SSIS package information.
Data Collection Configuration
Data collection configuration-related tables and views include information about collection sets, collector types, and collection items. The tables and views also contain global data collector parameters, such as the location of the management data warehouse. Database roles for the data collector and management data warehouse are also stored here. For more information about the columns for each data collection configuration view, see Data Collector Views (Transact-SQL).
Important
Do not modify data in the configuration tables. The data in these tables must be inserted, updated, and deleted in specific ways that are enforced by stored procedures. Table names and definitions can change when you update the application, and might change in future releases. Instead of directly using the tables, use the documented stored procedures and views provided with the data collector to access instance and application data.
SQL Server Agent Jobs
SQL Server Agent job information is stored in msdb using a separate schema. This job information is created or updated whenever collection sets are added, removed, or changed. When a collection set is started, SQL Server Agent uses this information to execute data collector jobs.
SQL Server Integration Services Packages
SSIS packages are stored in msdb using existing SSIS storage tables. The data collector uses these packages to collect data and upload it to the management data warehouse.
Auditing and History
Dedicated storage is provided for saving detailed audit and historical records of data collection runs. The data collector run-time component and the SSIS packages generate events during data collection. This information is used to monitor and troubleshoot the collection process.
Management Data Warehouse
The management data warehouse is a relational database that contains all the data that is retained. This database can be on the same system as the data collector, or it can be on another computer.
Execution
Components in the execution category run in-memory and are responsible for collecting and storing data.
SQL Server Agent
SQL Server Agent hosts the data collector run-time component. Data collection is scheduled as a SQL Server Agent job. When the job is run, it invokes the data collector run-time component using the parameters required to create and execute a collection set.
Data Collector Run-Time Component
The data collector run-time component is hosted by a stand-alone process called Dcexec.exe. This component manages data collection based on the definitions provided in a collection set, and can accept any collection set as input. The data collector run-time component is responsible for loading and executing the SSIS packages that are part of a collection set.
A collection set is run in one of the following collection and upload modes:
Non-cached mode. Data collection and upload are on the same schedule. The packages start, collect and upload data at their configured frequency, and run until they are finished. After the packages finish, they are unloaded from memory.
Cached mode. Data collection and upload are on different schedules. The packages collect and cache data until they receive a signal to exit from a loop control-flow task. This ensures that the data flow can be executed repeatedly, which enables continuous data collection.
Note
The data collector run-time component can only run data collection or data upload. It cannot run these tasks concurrently.
SSIS Run-Time Component
The data collector run-time component invokes the SSIS packages that the data collector uses. The packages are run in-process.
The two most important tasks for the SSIS packages are data collection and data upload, which are carried out by separate packages.
The collection package gathers data from a data provider and keeps it in temporary storage. If configured to do so, this package also appends additional information to the data it collects, such as the snapshot time or information about the source of the data.
The upload package reads the data in temporary storage, processes the data as required (for example, removing unnecessary data points, normalizing the data, and data aggregation), and then uploads the data to the management data warehouse. The upload is done as a bulk insert to minimize the impact on server performance.
Separating data collection and data upload provides more flexibility and efficiency. This design supports scenarios where snapshots of the data are captured at frequent intervals (for example, every 15 seconds), but the collected data only needs to be uploaded every hour. Data collection and upload frequency should be determined by the monitoring requirements of a particular SQL Server installation.
Temporary Storage
Temporary storage is used during data collection to store property values exposed by collection sets or calculated dynamically. These values can be raw data obtained during the collection process, or processed data, such as aggregated data.
API
The API category includes the class library and object model used to configure, maintain, and extend the data collector. Client access to the system database must come through the API.
Class Library
The data collector has its own namespace and is independent of SQL Server Management Objects (SMO).
Stored Procedures and Views
The stored procedures and views provide Create, Read, Update, and Delete access to data collector configuration data and the management data warehouse.
Client
The data collector uses SQL Server Management Studio and a Windows Powershell™ command-line interface as clients for configuring and using the data collector. You can use SQL Server Management Studio to do the following:
Configure data collection using the Configure Data Collection Wizard in Object Explorer.
Enable or disable data collection by using Object Explorer.
Start or stop a specific collection set by using Object Explorer.
View or configure data collection or data collection set properties by using Object Explorer.
View reports based on historical data stored in the management data warehouse. This database can be located on the local instance or on another server. The latter is recommended.
You can use the Windows Powershell command-line interface to do the following:
Start or stop a collection set.
Add or remove a collection set.
Change the editable properties of a collection set.
Component Interaction and Processing
The following table describes the interaction between the various components in the data collector architecture.
Component |
Component |
Description |
---|---|---|
SQL Server Management Studio |
Data collector API |
The client communicates with the API to configure the data collector. |
SQL Server Management Studio |
Management data warehouse |
The client has Read access to the management data warehouse and can obtain the data that is collected. |
SQL Server Management Studio, console client |
Data collector API |
Both clients can configure the data collector programmatically by using the API. |
Data collector API |
Stored procedures and views |
The API uses stores procedures, which have Read and Write access to the data collector configuration information and the management data warehouse. |
msdb - Job definition |
SQL Server Agent |
SQL Server Agent can read the job definition. The job definition contains job schedule and job configuration information. |
SQL Server Agent |
Data collector run-time component |
SQL Server Agent invokes the data collector in response to the job definition. |
Data collector run-time component |
msdb - SSIS packages |
The data collector loads the SSIS packages and can use SSIS to execute these packages. |
Data collector run-time component |
msdb - auditing and history |
The data collector has Write access to this part of msdb so it can record the events generated during data collection. |
SSIS |
Management data warehouse |
The SSIS package has Write access to the management data warehouse so it can upload the data that is collected. |
SSIS |
Temporary storage |
During the collection process, SSIS can create, use, and delete any temporary storage that it requires. |
msdb - Data collection configuration |
msdb - Job definition |
Information in the data collection configuration is mapped to the job definition. For more information, see "Collection Set and Job Mapping" below. |
Collection Set and Job Mapping
A key element of the data collector architecture is the mapping between the collection sets defined in the data collector configuration, and the jobs in the job definition.
A collection set is a group of items that are instances of the collector types, which are defined by using SSIS packages. As an instance in memory, every collection set has a lifetime, which is described in more detail later in this topic. Each collection set is run by the data collector run-time component when it is invoked by a SQL Server job. The following examples illustrate how collection sets map to jobs.
Collection Sets
Example 1
This example collection set, "Base Performance Data," contains separate collection and upload packages to support different schedules for the collection and upload packages.
Collection Set: Name = "Base Performance Data"
Collection Item: Name = "SQL_RE_Active_Requests_History"
Collection package: ARH_Collect.dtsx
Upload package: ARH_Upload.dtsx
Collection Frequency: 1 second
Collection Item: Name = "SQL_RE_Wait_Statistics"
Collection package: WS_Collect.dtsx
Upload Package: WS_Upload.dtsx
Collection Frequency: 15 seconds
Collection Item: Name = "SQL_RE_Performance_Counters"
Collection package: PC_Collect.dtsx
Upload package: PC_Upload.dtsx
Input: Counter – Locks: Current Latch Waits
Collection Frequency: 15 seconds
Collection Item: Name = "WIN_OS_Performance_Counters"
Collection package: OS_PC_Collect.dtsx
Upload package: OS_PC_Upload.dts
Input: Counter – Process: % Processor Time
Collection Frequency: 15 seconds
Upload Schedule: Every 30 minutes
Example 2
This example collection set, "Server Configuration Data," contains a single package that gets Transact-SQLbased snapshots and uploads them to the management data warehouse as they are obtained.
Collection Set: Name = "Server Configuration Data"
Collection Item: Name = "SQL_RE_TSQL"
Collection and upload package: TSQL_Upload.dtsx
Input: Transact-SQL query for database sizes
Collection Frequency: <none>
Collection Item: Name = "SQL_RE_TSQL"
Collection and upload package: TSQL_Upload.dtsx
Input: Transact-SQL query for configuration options
Collection Frequency: <none>
Upload Schedule: Every 24 hours
Jobs
When the collection sets shown in the preceding examples are created, SQL Server Agent jobs are created to perform the data collection. The following rules govern job creation:
Data collection by different collection sets are executed as separate jobs.
Collection sets that use cached collection mode are scheduled as jobs that start when the SQL Server Agent starts. These jobs run continuously and are controlled by the data collector run-time component.
Collection sets that use non-cached collection mode are scheduled as normal SQL Server Agent jobs. These jobs use a SQL Server Agent schedule that matches the schedule defined in the collection set. The actual duration of execution can be customized by the user.
The following examples illustrate jobs that use the preceding rules. These jobs can run the collection set examples provided.
Jobs for Collection Set Example 1
Two jobs are used to handle this collection set: one for the collection and one for the upload.
Job 1,Name: "Base Performance Data Collection"
Category: "Data Collector"
Job Step 1:
Collection Set: Name = "Base Performance Data"
Collection Mode: Collection Only, Continuous Run
Packages run: ARH_Collect, WS_Collect, PC_Collect, OS_PC_Collect
Schedule: At Agent start-up
Job 2,Name: "Base Performance Data Upload"
Category: "Data Collector"
Job Step 1:
Collection Set: Name = "Base Performance Data"
Collection Mode: Upload Only, Single Run
Packages run: ARH_Upload, WS_Upload, PC_Upload, OS_PC_Upload
Schedule: Every 30 minutes, Starting at 12:00 AM PST
Job for Collection Set Example 2
One job is created to handle the collection set.
Job 3,Name = "Server Configuration Data"
Category = "Data Collector"
Job Step 1:
Collection Set, Name = "Server Configuration Data"
Collection Mode: Collection and Upload, Single Run
Packages run: TSQL_Upload, TSQL_Upload
Schedule: Every Day at 12:00 AM PST
Note
Each of the above examples specifies the collection mode for the job. This information is used to manage collection and upload processing when the collection set is running.
The Data Collector and Collection Sets
Because a data collector and a collection set are the elements of data collection that you interact with, you should understand their valid states and the concept of a collection set lifetime.
Valid States
To turn data collection on and off, you can enable or disable data collection. If you want to make any change that affects collector-wide parameters, such as changes to the management data warehouse connection string, you must first disable data collection.
If you want to make any collection set configuration changes, you must first stop the collection set.
If you disable data collection when collection sets are running, these collections sets enter a paused state until you re-enable data collection. At that point, the collection sets resume execution. Any collection set that was not running when you disabled data collection will remain stopped. You can start any collection set while data collection is disabled. However, the collection set will not start execution until you enable data collection.
Collection Set Lifetime
After a collection set starts, it can undergo several transitions in its lifetime. These possible transitions, and the actions that occur during them, include the following:
Construct
A collection set object is created in memory.
All properties can be freely set and modified.
Add collector type
A new collection item is added to the collection set.
The collection set verifies that all collection items reference the same target root.
Remove collector type
- An existing collection item is removed from the collection set.
Create
The collection set verifies that it contains at least one collection item.
Information about all collection items and their properties, collection frequencies, and all the properties of the collection set is inserted into data collector configuration storage.
The initial state of the collection set is set to inactive.
The SQL Server Agent job or jobs that will run the collection set are created and linked to the collection set.
The initial state of the SQL Server Agent jobs is set to inactive.
Start
The state of the collection set changes to running.
Associated SQL Server Agent jobs are enabled.
If a job is flagged to run on Agent startup, the job starts immediately.
Stop
The state of the collection set changes to stopped.
All running jobs for the collection set are stopped and disabled.
All the jobs associated with the collection set are set to inactive.
Delete
All the jobs associated with the collection set are deleted from memory.
All the objects (collection items, collection set) in data collector configuration that are associated with the collection set are deleted.
Run-Time Processing
The following illustration provides a high-level and simplified view of run-time processing when a collection set is started and data is collected and uploaded. Although this illustration shows data collection and data upload as two separate job processes, they can also be a single job, as shown in the preceding examples.
Data Collection
In the preceding illustration, data collection process steps are prefixed with "C". The processing steps are as follows:
A stored procedure starts the collection set. The state of the collection set in msdb is changed to Running. This state change enables the SQL Server Agent jobs that are associated with the collection set.
SQL Server Agent starts the job as follows:
If the collection set mode is set to cached, the collection job is started immediately.
If the collection set mode is set to non-cached, the collection job is started according to the schedule that is configured.
When a job starts, it spawns a process to execute the data collector run-time component (Dcexec.exe) and tells it to run the collection package associated with the collection set.
When Dcexec.exe starts, it reads the data collector configuration in msdb to find out which collection mode, cached or non-cached, to use.
The data collector run-time component uses the SSIS run-time component to run the collection packages that are defined for the collection set.
The collection packages run until they receive a signal to stop or until they reach a predefined endpoint. The collection packages save the data that is collected in a staging area until it is uploaded.
Note
The design of the collector type for the collection set determines what kind of staging area is used. The staging area could be a shared memory area, temporary files, or a message queue.
Data Upload
In the preceding illustration, the data upload process steps are prefixed with "U". The processing steps are as follows:
When an upload is required, a stored procedure starts the job, which is handled by SQL Server Agent.
SQL Server Agent starts the job as follows:
If an on-demand upload request is submitted by the user, the upload job is started immediately.
If there is no on-demand upload request, the upload job is started according to the schedule that is configured for the collection set.
When a job starts, it spawns a process to execute the data collector run-time component (Dcexec.exe) and tells it to run the upload package associated with the collection set.
When Dcexec.exe starts, it reads the data collector configuration in msdb to find out how to handle the upload (for example, the connection string to the management data warehouse, or any data processing that is required).
The data collector run-time component uses the SSIS run-time component to run the upload packages that are defined for the collection set.
The upload package reads the data that is held in the staging area and performs the required transformations, such as normalization or sampling. When the transformations are finished, the upload package connects to the management data warehouse and inserts the data.
After the upload is finished, the process ends, but the collection set state is unchanged. The next upload starts according to the collection set schedule.