Monitor Azure SQL Database with metrics and alerts
Applies to: Azure SQL Database
You can use Azure Monitor metrics to monitor database and elastic pool resource consumption and health. You can use alerts to send notifications when metric values indicate a potential problem.
Metrics
A metric is a series of numeric value measured at regular time intervals, often using units such as count
, percent
, bytes
, etc. Depending on the nature of the metric, you can use aggregations such as total
, count
, average
, minimum
, maximum
to calculate metric values over a duration of time. You can split some metrics by dimensions. Each dimension provides an additional context to the numeric values.
Examples of available Azure SQL Database metrics are: CPU percentage
, Data space used
, Deadlocks
, Tempdb Percent Log Used
.
See database metrics and elastic pool metrics for all available metrics in Azure SQL Database.
Note
Some metrics apply only to specific types of databases or elastic pools. The description of each metric mentions if its use it limited to a specific database or elastic pool type, such as vCore, Hyperscale, serverless, etc.
In Azure SQL Database portal, several commonly used metrics are charted on the Monitoring tab of the Overview page. The metrics let you assess resource consumption and health of a database or an elastic pool at a glance.
Under Key metrics, select See all metrics or anywhere inside the chart to open metrics explorer. In the Metrics page, you can view all other available metrics for the database or elastic pool resource. In metrics explorer, you can change the time range, granularity, and aggregation type for the chart, change the type of chart, expand the scope to include metrics from other Azure resources, create alert rules, etc. You can also open metrics explorer by selecting Metrics menu item, under Monitoring in the resource menu.
Use metrics to monitor databases and elastic pools
You can use metrics to monitor database and elastic pool resource consumption and health. For example, you can:
- Right-size the database or elastic pool to your application workload
- Detect a gradual increase in resource consumption, and proactively scale up the database or elastic pool
- Detect and troubleshoot a performance problem
The following table describes commonly used metrics in Azure SQL Database.
Metric name | Metric ID | Description |
---|---|---|
CPU percentage | cpu_percent |
This metric shows CPU consumption toward the user workload limit of a database or an elastic pool, expressed as a percentage. For more information, see Resource consumption by user workloads and internal processes. |
SQL instance CPU percent | sql_instance_cpu_percent |
This metric shows the total CPU consumption by the user and system workloads, expressed as a percentage. Because this metric and the CPU percentage metric are measured on different scales, they are not directly comparable with each other. For more information, see Resource consumption by user workloads and internal processes. |
Data IO percentage | physical_data_read_percent |
This metric shows the data file IO consumption toward the user workload limit of a database or an elastic pool, expressed as a percentage. For more information, see Data IO governance. |
Log IO percentage | log_write_percent |
This metric shows the transaction log write throughput consumption toward the user workload limit of a database or an elastic pool, expressed as a percentage. For more information, see Transaction log rate governance. |
Workers percentage | workers_percent |
This metric shows the consumption of worker threads toward the user workload limit of a database or an elastic pool, expressed as a percentage. |
DTU percentage | dtu_consumption_percent |
This metric shows DTU consumption toward the user workload limit of a database or an elastic pool, expressed as a percentage. DTU percentage is derived from three other metrics: CPU percentage, Data IO percentage, and Log IO percentage. At any point in time, DTU percentage matches the highest value among these three metrics. |
CPU used | cpu_used |
This metric shows CPU consumption toward the user workload limit of a database or an elastic pool, expressed as the number of vCores. For more information, see Diagnose and troubleshoot high CPU on Azure SQL Database. |
DTU used | dtu_used |
This metric shows the number of DTUs used by a database or an elastic pool. |
App CPU billed | app_cpu_billed |
For serverless databases, this metric shows the amount of compute (CPU and memory) billed, expressed in vCore seconds. For more information, see Billing in the serverless compute tier. |
App CPU percentage | app_cpu_percent |
For serverless databases, this metric shows CPU consumption toward the app package maximum vCore limit, expressed as a percentage. For more information, see Monitoring in the serverless compute tier. |
App memory percentage | app_memory_percent |
For serverless databases, this metric shows memory consumption toward the app package maximum memory limit, expressed as a percentage. For more information, see Monitoring in the serverless compute tier. |
Sessions count | sessions_count |
This metric shows the number of established user sessions for a database or an elastic pool. |
Data space used | storage |
For databases, this metric shows the amount of storage space used in the data files of a database. |
Data space used | storage_used |
For elastic pools, this metric shows the amount of storage space used in the data files of all databases in an elastic pool. |
Data space allocated | allocated_data_storage |
This metric shows the amount of storage space occupied by the data files of a database, or by the data files of all databases in an elastic pool. Data files might contain empty space. Because of this, Data space allocated if often higher than Data space used for the same database or elastic pool. For more information, see Manage file space for databases in Azure SQL Database. |
Data space used percent | storage_percent |
For databases, this metric shows the amount of storage space used in the data files of a database toward the data size limit of a database. For elastic pools, it shows the amount of storage space used in the data files of all databases in an elastic pool, expressed as a percentage toward the data size limit of an elastic pool. The data size limit for a database or an elastic pool might be configured lower than the maximum data size limit. To find the maximum data size limit, see resource limits for vCore databases, vCore elastic pools, DTU databases, and DTU elastic pools. |
Data space allocated percent | allocated_data_storage_percent |
For elastic pools, this metric shows the amount of storage space occupied by the data files of all databases in an elastic pool toward the data size limit of the pool, expressed as a percentage. |
Tempdb Percent Log Used | tempdb_log_used_percent |
This metric shows the consumption of the transaction log space in the tempdb database toward the maximum log size, expressed as a percentage. For more information, see tempdb in Azure SQL Database. |
Successful Connections | connection_successful |
This metric shows the number of successfully established connections to a database. This metric can be split by two dimensions, SslProtocol and ValidatedDriverNameAndVersion , to see the number of connections using a specific encryption protocol version, or using a specific client driver. |
Failed Connections : System Errors | connection_failed |
This metric shows the number of connection attempts to a database that failed because of internal service errors. Most commonly, such errors are transient. This metric can be split by two dimensions, Error and ValidatedDriverNameAndVersion , to see the number of failed connection attempts due to a specific error, or from a specific client driver. |
Failed Connections : User Errors | connection_failed_user_error |
This metric shows the number of connection attempts to a database that failed because of user-correctable errors, such as an incorrect password or connection being blocked by firewall. This metric can be split by two dimensions, Error and ValidatedDriverNameAndVersion , to see the number of failed connection attempts due to a specific error, or from a specific client driver. |
Deadlocks | deadlock |
This metric shows the number of deadlocks in a database. |
Availability | availability |
Availability is determined based on the database being operational for connections. For each one-minute data point, the possible values are either 100% or 0% . For more information, see Availability metric. |
Availability metric
The Availability metric tracks availability at individual Azure SQL Database level. This feature is currently in preview.
Availability is granular to one minute of connection outage. Availability is determined based on the database being operational for connections. A minute is considered as downtime or unavailable if all continuous attempts by users to establish connection to the database within the minute fail due to a service issue. If there is intermittent unavailability, the duration of continuous unavailability must cross the minute boundary to be considered as downtime. Typically, the latency to display availability is less than three minutes.
Here's the logic used for calculating Availability for every one-minute interval:
- If there is at least one successful connection, then availability is 100%.
- If all connections fail due to user errors, availability is 100%.
- If there are no connection attempts, availability is 100%.
- If all connections fail due to system errors, availability is 0%.
- Currently, availability metric data is not yet supported for the serverless compute tier and will be displayed as 100%.
Availability metric is therefore a composite metric derived from following existing metrics:
- Successful Connections
- Failed Connections : User Errors
- Blocked by Firewall
- Failed Connections : System Errors
User errors include all connections that fail due to user configuration, workload, or management. System errors include all the connections that fail due to transient issues related to Azure SQL Database service.
Examples of errors caused by user configuration:
Examples of errors caused by user workload:
Examples of errors caused by user management:
- Scaling up or down the database or elastic pool
- Geo replication planned or unplanned failover
- Failover group planned or unplanned failover
- Geo secondary database in seeding state
- Database that is in restoring state due to Point In Time Restore (PITR), Long Term Restore (LTR), or restore from a deleted database
- Database that is not yet finished being copied (Database Copy)
Alerts
You can create alert rules to notify you that the value of one metric or multiple metrics is outside of an expected range.
You can set the scope of an alert rule in multiple ways to suit your needs. For example, alert rule scope can be set to:
- A single database
- An elastic pool
- All databases or elastic pools in a resource group
- All databases or elastic pools in a subscription within an Azure region
- All databases or elastic pools in a subscription within all regions
Alert rules periodically evaluate aggregated metric values over a lookback period, comparing them to a threshold value. You can configure the threshold value, evaluation frequency, and lookback period.
If an alert rule is triggered, you are notified according to your notification preferences, which you specify in the action group linked to the alert rule. For example, you can receive an email, an SMS, or a voice notification. An alert rule can also trigger actions such as webhooks, automation runbooks, functions, logic apps, etc. You can integrate alerts with supported IT service management products.
To learn more about Azure Monitor alerts, see Azure Monitor alerts overview. To get familiar with metric alerts, review Metric alerts, Manage alert rules, and Action groups.
Recommended alert rules
The metrics and optimal thresholds to use in alert rules vary across the wide spectrum of customer workloads in Azure SQL Database.
The recommended alerts in the following table are a starting point to help you define the optimal alerting configuration for your Azure SQL Database resources. Depending on your requirements, your configuration might differ from this example. You might use different thresholds, evaluation frequencies, or lookback periods. You might choose to create additional alerts, or use different alert rule configurations for different applications and environments.
Here are examples of typical alert rule configurations.
Alert rule name | Metric (signal) | Alert logic | When to evaluate | Suggested severity |
---|---|---|---|---|
High user CPU usage | CPU percentage | Threshold: Static Aggregation: Average Operator: Greater than Threshold value: 90 |
Check every: 1 minute Lookback period: 10 minutes |
2 - Warning |
High total CPU usage | SQL instance CPU percent | Threshold: Static Aggregation: Average Operator: Greater than Threshold value: 90 |
Check every: 1 minute Lookback period: 10 minutes |
2 - Warning |
High worker usage | Workers percentage | Threshold: Static Aggregation: Minimum Operator: Greater than Threshold value: 60 |
Check every: 1 minute Lookback period: 5 minutes |
1 - Error |
High data IO usage | Data IO percentage | Threshold: Static Aggregation: Average Operator: Greater than Threshold value: 90 |
Check every: 1 minute Lookback period: 15 minutes |
3 - Informational |
Low data space | Data space used percent | Threshold: Static Aggregation: Minimum Operator: Greater than Threshold value: 95 |
Check every: 15 minute Lookback period: 15 minutes |
1 - Error |
Low tempdb log space |
Tempdb Percent Log Used | Threshold: Static Aggregation: Minimum Operator: Greater than Threshold value: 60 |
Check every: 1 minute Lookback period: 5 minutes |
1 - Error |
Deadlocks | Deadlocks | Threshold: Dynamic Aggregation: Total Operator: Greater than Threshold sensitivity: Medium |
Check every: 15 minutes Lookback period: 1 hour |
3 - Informational |
Failed connections (user errors) | Failed Connections : User Errors | Threshold: Dynamic Aggregation: Total Operator: Greater than Threshold sensitivity: Medium |
Check every: 5 minutes Lookback period: 15 minutes |
2 - Warning |
Failed connections (system errors) | Failed Connections : System Errors | Threshold: Static Aggregation: Total Operator: Greater than Unit: Count Threshold value: 10 |
Check every: 1 minute Lookback period: 5 minutes |
2 - Warning |
Anomalous connection rate | Successful Connections | Threshold: Dynamic Aggregation: Total Operator: Greater or Less than Threshold sensitivity: Low |
Check every: 5 minutes Lookback period: 15 minutes |
2 - Warning |
Some of the recommended alert rules use dynamic thresholds to detect anomalous metric patterns that might require attention. Alert rules based on dynamic thresholds do not trigger until sufficient historical data has been collected to establish normal patterns. For more information, see Dynamic thresholds in metric alerts.
By default, metric alerts are stateful. This means that once an alert rule is triggered, the alert is fired only once. The alert remains in the fired
state until it is resolved, at which point a resolved
notification is sent. An alert rule triggers a new alert only once the previous alert is resolved. Stateful alerts avoid frequent notifications about an ongoing condition. For more information about stateful and stateless alerts, see Alerts and state.
Related content
- Azure Monitor metrics overview
- Monitor Azure SQL workloads with database watcher (preview)
- Analyze metrics with Azure Monitor metrics explorer
- Azure Monitor metrics aggregation and display explained
- Azure Monitor alerts overview
- Tutorial: Create a metric alert for an Azure resource
- Best practices for Azure Monitor alerts
- Troubleshooting problems in Azure Monitor alerts
- Monitoring and performance tuning in Azure SQL Database and Azure SQL Managed Instance
- Configure streaming export of Azure SQL Database and SQL Managed Instance diagnostic telemetry