Training
Module
Implement error handling with Transact-SQL - Training
Implement error handling with Transact-SQL
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Applies to: Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics
This article describes how to prevent, troubleshoot, diagnose, and mitigate connection errors and transient errors that your client application encounters when it interacts with Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics. Learn how to configure retry logic, build the connection string, and adjust other connection settings.
A transient error, also known as a transient fault, has an underlying cause that soon resolves itself. An occasional cause of transient errors is when the Azure system quickly shifts hardware resources to better load-balance various workloads. Most of these reconfiguration events finish in less than 60 seconds. During this reconfiguration time span, you might have issues with connecting to your database in SQL Database. Applications that connect to your database should be built to expect these transient errors. To handle them, implement retry logic in their code instead of surfacing them to users as application errors.
If your client program uses ADO.NET, your program is told about the transient error by the throw of SqlException.
Retry the SQL Database and SQL Managed Instance connection or establish it again, depending on the following:
After a delay of several seconds, retry the connection.
Do not immediately retry the command. Instead, after a delay, freshly establish the connection. Then retry the command.
Client programs that occasionally encounter a transient error are more robust when they contain retry logic. When your program communicates with your database in SQL Database through third-party middleware, ask the vendor whether the middleware contains retry logic for transient errors.
SELECT
statement that failed with a transient error. Instead, establish a fresh connection, and then retry the SELECT
.UPDATE
statement fails with a transient error, establish a fresh connection before you retry the UPDATE. The retry logic must ensure that either the entire database transaction finished or that the entire transaction is rolled back.We recommend that you wait for 5 seconds before your first retry. Retrying after a delay shorter than 5 seconds risks overwhelming the cloud service. For each subsequent retry, the delay should grow exponentially, up to a maximum of 60 seconds.
For a discussion of the blocking period for clients that use ADO.NET, see Connection pooling (ADO.NET).
You also might want to set a maximum number of retries before the program self-terminates.
Code examples with retry logic are available at:
To test your retry logic, you must simulate or cause an error that can be corrected while your program is still running.
One way you can test your retry logic is to disconnect your client computer from the network while the program is running. The error is:
As part of the first retry attempt, you can reconnect your client computer to the network and then attempt to connect.
To make this test practical, unplug your computer from the network before you start your program. Then your program recognizes a runtime parameter that causes the program to:
Your program can purposely misspell the user name before the first connection attempt. The error is:
As part of the first retry attempt, your program can correct the misspelling and then attempt to connect.
To make this test practical, your program recognizes a runtime parameter that causes the program to:
If your client program connects to your database in Azure SQL Database by using the .NET Framework class System.Data.SqlClient.SqlConnection, use .NET 4.6.1 or a later version (or .NET Core) so that you can use its connection retry feature. For more information about this feature, see SqlConnection.ConnectionString Property.
When you build the connection string for your SqlConnection object, coordinate the values among the following parameters:
The connection retry settings (ConnectRetryCount and ConnectRetryInterval) apply to connection resiliency. Connection resiliency includes the following distinct types:
Open connection resiliency refers to the initial SqlConnection.Open or OpenAsync() method. The first connection attempt is counted as try zero. ConnectRetryCount applies to subsequent retries. Therefore, if connection zero fails (this might not occur immediately), ConnectRetryInterval is applied first followed by subsequent ConnectRetryCount (and ConnectRetryInterval) attempts. To take advantage of all retry attempts, the Connection Timeout property must provide time for all attempts.
Idle connection resiliency refers to the automatic detection and reconnection of existing idle connections that were broken. The first attempt to reconnect a broken idle connection is counted as the first retry attempt. To take advantage of all retry attempts, the Command Timeout must provide time for all attempts.
Example: Assume the following values for the ConnectRetryCount and ConnectRetryInterval parameters:
ConnectRetryCount: 3 ConnectRetryInterval: 10 seconds
See how these values are used in the following scenarios:
Scenario: New connection
4:10:00 - Connection.Open() - zero attempt
4:10:01 - Connection failure detected
4:10:11 - Retry 1 --> First retry occurs after ConnectRetryInterval
4:10:21 - Retry 2
4:10:31 - Retry 3
For this scenario your chosen values should satisfy the following condition:
Connection Timeout > = ConnectRetryCount * ConnectionRetryInterval
For example, if the count is 3 and the interval is 10 seconds, a timeout of only 29 seconds doesn't provide enough time for the system's third and final retry to connect:
29 < 3 * 10
Scenario: Idle connection
ConnectRetryCount: 3 ConnectRetryInterval: 10 seconds
4:10:00 - Broken connection detected on command execution
4:10:00 - Retry 1 -->First retry occurs immediately
4:10:10 - Retry 2
4:10:20 - Retry 3
This isn't the initial connection. Therefore, Connection Timeout doesn't apply. However, because the connection recovery occurs during command execution, the Command Timeout setting does apply. The Command Timeout default is 30 seconds. Although, connection recovery is fast in typical circumstances, an intermittent outage, could cause the recovery to take some of the command execution time.
For this scenario, if you want to take full advantage of idle connection recovery retries, your chosen values should satisfy the following condition:
Command Timeout > (ConnectRetryCount - 1) * ConnectionRetryInterval
For example, if the count is 3 and the interval is 10 seconds, a command timeout value lower than 20 seconds wouldn't give enough time for the third and final retry to connect: (3 - 1) * 10 = 20`
Also, consider that the command itself requires time to execute after the connection is recovered.
Note
The duration values that are provided in these scenarios are for demonstration only. The actual detection times in both scenarios depend on the underlying infrastructure.
The ConnectRetryCount and ConnectRetryInterval parameters let your SqlConnection object retry the connect operation without telling or bothering your program, such as returning control to your program. The retries can occur in the following situations:
There is a subtlety. If a transient error occurs while your query is being executed, your SqlConnection object doesn't retry the connect operation. It certainly doesn't retry your query. However, SqlConnection very quickly checks the connection before sending your query for execution. If the quick check detects a connection problem, SqlConnection retries the connect operation. If the retry succeeds, your query is sent for execution.
Suppose your application has robust custom retry logic. It might retry the connect operation four times. If you add ConnectRetryInterval and ConnectRetryCount =3 to your connection string, you will increase the retry count to 4 * 3 = 12 retries. You might not intend such a high number of retries.
The connection string that's necessary to connect to your database is slightly different from the string used to connect to SQL Server. You can copy the connection string for your database from the Azure portal.
Use the Azure portal to obtain the connection string that's necessary for your client program to interact with Azure SQL Database.
Select All services > SQL databases.
Enter the name of your database into the filter text box near the upper left of the SQL databases pane.
Select the row for your database.
After the pane appears for your database, for visual convenience select the Minimize buttons to collapse the blades you used for browsing and database filtering.
On the pane for your database, select Show database connection strings.
Copy the appropriate connection string. i.e. If you intend to use the ADO.NET connection library, copy the appropriate string from the ADO.NET tab.
Edit the connection string as needed. i.e. Insert your password into the connection string, or remove "@<servername>" from the username if the username or server name are too long.
In one format or another, paste the connection string information into your client program code.
For more information, see Connection strings and configuration files.
You must configure SQL Database to accept communication from the IP address of the computer that hosts your client program. To set up this configuration, edit the firewall settings through the Azure portal.
If you forget to configure the IP address, your program fails with a handy error message that states the necessary IP address.
Sign in to the Azure portal.
In the list on the left, select All services.
Scroll and select SQL servers.
In the filter text box, start typing the name of your server. Your row is displayed.
Select the row for your server. A pane for your server is displayed.
On your server pane, select Settings.
Select Firewall.
Select Add Client IP. Type a name for your new rule in the first text box.
Type in the low and high IP address values for the range you want to enable.
Select Save.
For more information, see Configure firewall settings in SQL Database.
Typically, you need to ensure that only port 1433 is open for outbound communication on the computer that hosts your client program.
For example, when your client program is hosted on a Windows computer, you can use Windows Firewall on the host to open port 1433.
If your client program is hosted on an Azure virtual machine (VM), read Ports beyond 1433 for ADO.NET 4.5 and SQL Database.
For background information about configuration of ports and IP addresses in your database, see Azure SQL Database firewall.
If your program uses ADO.NET classes like System.Data.SqlClient.SqlConnection to connect to SQL Database, we recommend that you use .NET Framework version 4.6.2 or later.
When you use a connection object from a connection pool, we recommend that your program temporarily closes the connection when it's not immediately in use. It's not expensive to reopen a connection, but it is to create a new connection.
If you use ADO.NET 4.0 or earlier, we recommend that you upgrade to the latest ADO.NET. As of August 2018, you can download ADO.NET 4.6.2.
If your program fails to connect to your database in SQL Database, one diagnostic option is to try to connect with a utility program. Ideally, the utility connects by using the same library that your program uses.
On any Windows computer, you can try these utilities:
sqlcmd.exe
, which connects by using ODBCAfter your program is connected, test whether a short SQL SELECT query works.
If you suspect that connection attempts fail due to port issues, you can run a utility on your computer that reports on the port configurations.
On Linux, the following utilities might be helpful:
netstat -nap
nmap -sS -O 127.0.0.1
: Change the example value to be your IP address.On Windows, the PortQry.exe utility might be helpful. Here's an example execution that queried the port situation on a database in SQL Database and that was run on a laptop computer:
[C:\Users\johndoe\]
>> portqry.exe -n johndoesvr9.database.windows.net -p tcp -e 1433
Querying target system called: johndoesvr9.database.windows.net
Attempting to resolve name to IP address...
Name resolved to 23.100.117.95
querying...
TCP port 1433 (ms-sql-s service): LISTENING
[C:\Users\johndoe\]
>>
An intermittent problem is sometimes best diagnosed by detection of a general pattern over days or weeks.
Your client can assist in a diagnosis by logging all errors it encounters. You might be able to correlate the log entries with error data that SQL Database logs itself internally.
Enterprise Library 6 (EntLib60) offers .NET managed classes to assist with logging. For more information, see 5 - As easy as falling off a log: Use the Logging Application Block.
Here are some Transact-SQL SELECT statements that query error logs and other information.
Query of log | Description |
---|---|
SELECT e.* FROM sys.event_log AS e WHERE e.database_name = 'myDbName' AND e.event_category = 'connectivity' AND 2 >= DateDiff (hour, e.end_time, GetUtcDate()) ORDER BY e.event_category, e.event_type, e.end_time; |
The sys.event_log view offers information about individual events, which includes some that can cause transient errors or connectivity failures. Ideally, you can correlate the start_time or end_time values with information about when your client program experienced problems. You must connect to the master database to run this query. |
SELECT c.* FROM sys.database_connection_stats AS c WHERE c.database_name = 'myDbName' AND 24 >= DateDiff (hour, c.end_time, GetUtcDate()) ORDER BY c.end_time; |
The sys.database_connection_stats view offers aggregated counts of event types for additional diagnostics. You must connect to the master database to run this query. |
You can search for entries about problem events in the SQL Database log. Try the following Transact-SQL SELECT statement in the master database:
SELECT
object_name
,CAST(f.event_data as XML).value
('(/event/@timestamp)[1]', 'datetime2') AS [timestamp]
,CAST(f.event_data as XML).value
('(/event/data[@name="error"]/value)[1]', 'int') AS [error]
,CAST(f.event_data as XML).value
('(/event/data[@name="state"]/value)[1]', 'int') AS [state]
,CAST(f.event_data as XML).value
('(/event/data[@name="is_success"]/value)[1]', 'bit') AS [is_success]
,CAST(f.event_data as XML).value
('(/event/data[@name="database_name"]/value)[1]', 'sysname') AS [database_name]
FROM
sys.fn_xe_telemetry_blob_target_read_file('el', null, null, null) AS f
WHERE
object_name != 'login_event' -- Login events are numerous.
and
'2015-06-21' < CAST(f.event_data as XML).value
('(/event/@timestamp)[1]', 'datetime2')
ORDER BY
[timestamp] DESC
;
The following example shows what a returned row might look like. The null values shown are often not null in other rows.
object_name timestamp error state is_success database_name
database_xml_deadlock_report 2015-10-16 20:28:01.0090000 NULL NULL NULL AdventureWorks
Enterprise Library 6 (EntLib60) is a framework of .NET classes that helps you implement robust clients of cloud services, one of which is SQL Database. To locate topics dedicated to each area in which EntLib60 can assist, see Enterprise Library 6.
Retry logic for handling transient errors is one area in which EntLib60 can assist. For more information, see 4 - Perseverance, secret of all triumphs: Use the Transient Fault Handling Application Block.
Note
The source code for EntLib60 is available for public download from the Download Center. Microsoft has no plans to make further feature updates or maintenance updates to EntLib.
The following EntLib60 classes are particularly useful for retry logic. All these classes are found in or under the namespace Microsoft.Practices.EnterpriseLibrary.TransientFaultHandling.
In the namespace Microsoft.Practices.EnterpriseLibrary.TransientFaultHandling:
In the namespace Microsoft.Practices.EnterpriseLibrary.TransientFaultHandling.TestSupport:
Here are some links to information about EntLib60:
For more information, see 5 - As easy as falling off a log: Use the Logging Application Block.
Next, from the SqlDatabaseTransientErrorDetectionStrategy class, is the C# source code for the IsTransient method. The source code clarifies which errors were considered transient and worthy of retry.
public bool IsTransient(Exception ex)
{
if (ex != null)
{
SqlException sqlException;
if ((sqlException = ex as SqlException) != null)
{
// Enumerate through all errors found in the exception.
foreach (SqlError err in sqlException.Errors)
{
switch (err.Number)
{
// SQL Error Code: 40501
// The service is currently busy. Retry the request after 10 seconds.
// Code: (reason code to be decoded).
case ThrottlingCondition.ThrottlingErrorNumber:
// Decode the reason code from the error message to
// determine the grounds for throttling.
var condition = ThrottlingCondition.FromError(err);
// Attach the decoded values as additional attributes to
// the original SQL exception.
sqlException.Data[condition.ThrottlingMode.GetType().Name] =
condition.ThrottlingMode.ToString();
sqlException.Data[condition.GetType().Name] = condition;
return true;
case 10928:
case 10929:
case 10053:
case 10054:
case 10060:
case 40197:
case 40540:
case 40613:
case 40143:
case 233:
case 64:
// DBNETLIB Error Code: 20
// The instance of SQL Server you attempted to connect to
// does not support encryption.
case (int)ProcessNetLibErrorCode.EncryptionNotSupported:
return true;
}
}
}
else if (ex is TimeoutException)
{
return true;
}
else
{
EntityException entityException;
if ((entityException = ex as EntityException) != null)
{
return this.IsTransient(entityException.InnerException);
}
}
}
return false;
}
Training
Module
Implement error handling with Transact-SQL - Training
Implement error handling with Transact-SQL