Artikel
10/01/2010

Troubleshooting the Data Collector

This topic addresses the following categories of troubleshooting issues:

Error conditions. This category covers object model and errors at run time.
Performance issues. This category covers general and specific performance scenarios.
System hangs. This category covers subcomponent hangs during data collection.

Error Conditions

Errors can be raised from the object model or at run time.

Object Model Errors

The data collector object model is a managed API that provides a programmatic way to manage data collector properties and data collection sets. The object model is a thin wrapper around a set of stored procedures and views that provide a configuration mechanism for the data collector. For more information, see Data Collector Programming.

Object model errors can come from one of the following components of the object model:

A Transact-SQL error is raised from a stored procedure or Transact-SQL code that is called from one of data collector stored procedures.
A Transact-SQL error is raised directly from a data collector stored procedure.
A managed exception is thrown directly from the object model.

The following table describes the errors that can be raised from the object model.

Error message	Error number	Description
Cannot update the name, target, proxy_id, or collection_mode of the active collection set '%s'. Stop the collection set and then try to update again.	14669	An attempt was made to update an active collection set. A collection set must be stopped before any update of that type can be done. Only the upload schedule can be changed while the collection set is active.
Cannot delete the active collection set '%s'. Stop the collection set and then try to delete it again.	14670	An attempt to delete a collection set that is running.
Cannot update the name or the parameters of the collection item '%s' in the active collection set '%s'. Stop the collection set and then try to update the collection item again.	14671	An attempt to update a collection item in a collection set that is running.
Cannot delete the collection item '%s' in the active collection set '%s'. Stop the collection set and then try to delete the collection item again.	14672	An attempt to delete a collection item in a collection set that is running.
Cannot delete the collector type '%s'. Delete all collection items associated with this collector type and then try to delete it again.	14673	An attempt to delete a collector type that has collection items associated with it.
Cannot upload data for the inactive collection set '%s'. Start the collection set and then try to upload the data again.	14674	An attempt to upload data collected by a collection set that is not running.
Cannot update the name, target, proxy_id, logging_level, or collection_mode, or add collection item to the active collection set '%s'. Stop the collection set and then try to update again.	14675	An attempt to update a collection set that is running.
The user does not have permission to change '%s'. The user should be member of data collector role '%s'.	14676	User attempted to update a property that can be changed only by a specific data collector role.
The user does not have permission to perform this operation. The user must be a member of data collector role '%s'.	14677	User attempted to perform an operation without being part of the required data collector role.
SQL Server trace with id %d has been stopped and closed by external user. SQL Server Trace collector will attempt to re-create the trace.	14678	A trace that was created and used by the data collector has been stopped and closed outside of the collector runtime.
The specified %s (%s) is not valid in this data warehouse.	14679	A parameter passed in to one of the stored procedures in the management data warehouse has a value that does not match other entries in the warehouse.
This version of instmdw.sql should only be executed against server running SQL Server 2005 or later.	14680	An attempt to install a management data warehouse on a server running SQL Server 2000 or earlier.
Cannot perform this procedure when the collector is disabled. Enable the collector and then try again.	14681	An attempt to perform an operation that conflicts with the state of the collector.
The state of the collection set has changed, but it will not start or stop until the collector is enabled.	14682	An attempt to start or stop the collection set when the collector is not enabled.
A collection set in snapshot or continuous mode requires a schedule.	14683	Create or update a collection set in snapshot or continuous mode without providing a schedule.
Caught error#: %d, Level: %d, State: %d, in Procedure: %s, Line: %d, with Message: %s	14684	A generic error occurred in the data collector component; the error was caught in a catch block and rethrown.
Invalid operation. The status of collection set with id = %d is currently "not running".	14685	A call to sp_syscollector_create_set_queue_and_service for a set with an is_running status of 0.
The MDWInstance and MDWDatabase parameters of the configuration store cannot be null.	14686	A connection string to the management data warehouse is null for either the MDWInstance or the MDWDatabase parameters.
Invalid value (%d) of the @cache_window parameter. Allowable values are: -1 (cache all upload data from previous upload failures), 0 (cache no upload data), N (cache data from N previous upload failures, where N >= 1)	14687	An attempt was made to set the value of the CacheWindow parameter of the collector configuration store to a value less than -1.
A collection set cannot start when SQL Server Agent is stopped. Start SQL Server Agent.	14688	An attempt was made to start a collection set when SQL Server Agent is not enabled.
A collection set cannot start if the management data warehouse is not configured. Run the instmdw.sql script to create and configure the management data warehouse.	14689	An attempt was made to start a collection set when management data warehouse is not set up.
Cannot perform this procedure when the collector is enabled. Disable the collector and then try again.	14690	An attempt to perform an operation that conflicts with the state of the collector.
The status of the collector cannot be null. This may indicate an internal corruption in the collector configuration data.	14691	A call to sp_syscollector_verify_collector_state found the CollectorEnabled parameter with a null value. This may indicate an internal corruption of the configuration data of the collector.

Errors at Run Time

Run-time errors can happen when a collection package or upload package runs and encounters a problem. The errors can come from one of the following components:

The data flow of a SQL Server 2008 Integration Services (SSIS) package. These errors can be the result of a failed data conversion or data truncation. The data collector records the number of rows that were affected by the error, and logs that number in the data collector log tables.
The control flow of an SSIS package. These errors are logged in the SSIS log tables in the msdb database (msdb.dbo.sysssislog) and are bubbled out to the data collector log tables.
The data collector run-time component (dcexec.exe). These errors are logged directly in the data collector log tables.

For more information, see Data Collector Logging.

We recommend one of the following methods for getting status information about run-time errors.

Transact-SQL stored procedures and views

To view the status of all currently running and finished collection sets or packages, run the following query:

use msdb
select * from syscollector_execution_log_full

The preceding query returns the following result set.

Column name	Description
log_id	A unique ID for each collection set execution. This is used to join this view with other detailed logs.
parent_log_id	The ID of the parent package or collection set. This is NULL for the collection set. The IDs are chained in a parent-child relationship to make it easy to identify which collection set started which package. In addition, this view groups the log entries by their parent-child linkage and indents the names of the packages so the call chain is clearly visible.
name	The name of the collection set or package that this log entry represents.
collection_mode	Collection set activity when the entry was made, either collection or upload.
start_time	When the collection set or the package was started.
last_iteration_time	For continuously running packages, the last time that the package captured a snapshot.
finish_time	For packages and collection sets that finished, the time the run finished.
duration	The time, in milliseconds, that the package or collection set has been running.
operator	The operator who started the collection set or package.
status	The status of the collection set or package. This will be: 0 - running 1 - finished 2 - failed
failure_task	If collection set or package failed, the name of the task in the SSIS package that caused the failure.
package_execution_id	A link to the SSIS log table.
collection_set_id	A link to the data collector configuration table. Note You can use the collection_set_id as a filter to focus on a specific collection set in the log.

For more information, see syscollector_execution_log_full (Transact-SQL).

You can obtain additional information about collection set and package execution by executing one of the functions that the data collector provides.

The following function returns detailed statistics about the collection set or package, including the number of error rows logged by the package.

select * from fn_syscollector_get_execution_stats(@log_id)

The next function returns a portion of the SSIS log (sysdtslog90) that matches the package_execution_id for a package. If the package failed, this is the best way to find out what the root error was.

select * from fn_syscollector_get_execution_details(@log_id)

Data collector status reports

You can obtain the same information returned by the preceding Transact-SQL queries by viewing the logs provided in SQL Server Management Studio. For more information, see How to: View Collection Set Logs.

Performance Issues

There are three primary sources of data that you can use for reviewing and diagnosing performance.

First, the log tables described in the previous section also provide useful information that can be used to address performance issues.

The fn_syscollector_get_execution_stats function returns the following information.

Column name	Description
avg_row_count_in	The average number of rows that entered the data flow tasks of the package.
min_row_count_in	The minimum number of rows that entered the data flow tasks of the package.
max_row_count_in	The maximum number of rows that entered the data flow tasks of the package.
avg_row_count_out	The average number of rows that exited the data flow tasks of the package.
min_row_count_out	The minimum number of rows that left the data flow tasks of the package.
max_row_count_out	The maximum number of rows that left the data flow tasks of the package.
avg_duration	The average time (in milliseconds) spent in the data flow component of the package.
min_duration	The minimum time (in milliseconds) spent in the data flow component of the package.
max_duration	The maximum time (in milliseconds) spent in the data flow component of the package.

A second source of performance data is the syscollector_execution_log_full table, which provides information about how long a collection set ran or is running.

Finally, performance counters can be used to help assess performance issues. The standard process counters for the data collector process (dcexec.exe) instances in particular provide good indicators of how much system resources the data collector run-time component is using.

Performance Issue Scenarios

Two scenarios are most likely to show up as performance issues when running the data collector:

The data collector is consuming too many system resources.
The data collector can not keep up with the collection load.

Excessive consumption of system resources

If an analysis of process performance counters indicates that the dcexec.exe processes are using too many system resources, the following investigation needs to be done.

First, determine if there is a single collection set that is using the most resources.

To identify the collection set, map the process ID to the collection set ID in syscollector_execution_log_full, and then find the collection set in the syscollector_collection_sets table.
Determine what the collection set is collecting. Use the following query to list all the collection items grouped in the set:
```
select * from syscollector_collection_set_items where collection_set_id = <id>
```
Using the information from the preceding query, consider the following questions:
- Are there too many collection items?
- Is a single collection item causing most of the problems?
- Is too much data being collected?
- If the answer is yes to any of the preceding questions, consider modifying the collection or collection items to reduce the volume of data that is collected. This will reduce resource consumption.

Second, determine if the number of active collection sets is causing the problem.

Find out how many collection sets are defined on the system by using the following query:
```
select count(*) from syscollector_collection_sets
```

Find out how many collection sets are currently running by using the following query:

select count(*) from syscollector_execution_log_full where parent_log_id is null and status = 1

If the performance problem is intermittent, see if the problem maps to any collection or upload activity. If the schedules are all the same, this could be causing the problem. Adjusting a collection or upload schedule might be a simple solution.

Unable to keep up with the load

This condition only occurs with continuously running collection sets. If the collection frequency is high and there is a large volume of data to collect, the collection package might not be able to process the data in the time allocated for single snapshot iteration. You can detect this condition by comparing the avg_duration and max_duration columns in the log table to the collection frequency defined for a specific collection item.

If the max_duration value is larger than the frequency value, the collection package might not always be able to keep up with the configured frequency. If the avg_duration value is higher than the frequency, the collection package has a consistent problem. In the latter case, the frequency should be reduced or the collection item should be modified to limit the amount of data collected.

System Hangs

A system can hang if a package running as part of data collector stops processing but does not exit and stays in this state. Most system hangs can be resolved by stopping and restarting a collection set.

It is important to distinguish a true hang from intended behavior.

A continuously running collection package spends most of the time in a wait state, waking up periodically to collect a data snapshot. After collecting the data, the package goes back into the wait state. This wait state may appear to be system hang, but it is not. To verify, check the syscollector_execution_log_full table for the suspect package. If the last_iteration_time was no later than the current time, the condition is not a hang.
A package might be designed to wait for an event that will trigger a collection action. In this case the package will wait for that event. This is not a hang.

To verify that there is a system hang related to the data collector, perform the following checks:

First, identify the dcexec.exe process ID corresponding to the collection set that you want to investigate.
Next, check to see if the process is running and if it is using any resources. Any hung process usually uses 0% CPU and does not allocate more memory. The process can also use a high percentage of the CPU. If this is the case it might be looping and not exiting memory.
Finally, check the log table for the process to see when it was last updated. If the update time is more than the frequency for the collection item, the process could be hung.

There are several reasons why a data collector process can hang. The following are the most common:

A package waits for the next iteration to be signaled, but this does not happen.
A package waits for a shared lock that is held by another package, but the lock is not released.
An error happened during package execution that was not handled properly and the control flow was disrupted, but the package did not fail completely.

In any of the preceding cases there are specific entries in the log that relate to the system hang. See if there are any messages that indicate a cause. In the event of a system hang, create a dump of the dcexec.exe process and investigate further.

Dela via