Random problems connecting to on-premises SQL 2008 R2

Prado, Michael 1 Reputation point
2021-04-30T14:00:19.947+00:00

A little bit of context first...we have three containers running 24/7 and subscribed to a Topic in our Azure Message Bus. During the day, they do the job easily because of the small load we have.

At 2am, we have a night process that pushes thousands of messages to the MB and the three containers start to call an API we have (let's call it Routing API). This API queries our on-premises SQL Server 2008 R2 database (no VPN) and also calls a third party API (using a singleton httpclient). Here's the issue...

After a few minutes processing the first messages, Routing API simply can't connect to our SQL Server anymore. This is the error message:

2021-04-29 07:09:02.6438|ERROR|Routing.API.Startup|[PUT] https://XXXX.azurewebsites.net/orders/0238451618 Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) ---> System.ComponentModel.Win32Exception (5): Access is denied. at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at Microsoft.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, Boolean encrypt, Boolean trustServerCert, Boolean integratedSecurity, Boolean withFailover, SqlAuthenticationMethod authType) at Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean ignoreSniOpenTimeout, TimeoutTimer timeout, Boolean withFailover) at Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString connectionOptions, SqlCredential credential, TimeoutTimer timeout) at Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(TimeoutTimer timeout, SqlConnectionString connectionOptions, SqlCredential credential, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance) at Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, Object providerInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, Boolean applyTransientFaultHandling, String accessToken, DbConnectionPool pool) at Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions) at Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions userOptions) at Microsoft.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection) at Microsoft.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection) at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection) at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection) at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection) at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource1 retry, DbConnectionOptions userOptions) at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides) at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides) at Routing.Data.MarthaRepository.GetScalarT in D:\a\1\s\Routing.Data\MarthaRepository.cs:line 45 at Routing.Business.Services.OrderService.SetProfit(Order order) in D:\a\1\s\Routing.Business\Services\OrderService.cs:line 123 at Routing.Business.Services.OrderService.GetOrder(String orderid) in D:\a\1\s\Routing.Business\Services\OrderService.cs:line 85 at Routing.Business.Services.OrderService.UpdateOrderAsync(String orderid) in D:\a\1\s\Routing.Business\Services\OrderService.cs:line 66 at Routing.API.Controllers.OrdersController.Updated(String orderid) in D:\a\1\s\Routing.API\Controllers\OrdersController.cs:line 48 at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.TaskOfIActionResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments) at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Logged|12_1(ControllerActionInvoker invoker) at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted) at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context) at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted) at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted) at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Awaited|19_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted) at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Logged|17_1(ResourceInvoker invoker) at Microsoft.AspNetCore.Routing.EndpointMiddleware.g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger) at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context) at Swashbuckle.AspNetCore.SwaggerUI.SwaggerUIMiddleware.Invoke(HttpContext httpContext) at Swashbuckle.AspNetCore.Swagger.SwaggerMiddleware.Invoke(HttpContext httpContext, ISwaggerProvider swaggerProvider) at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware.g__Awaited|6_0(ExceptionHandlerMiddleware middleware, HttpContext context, Task task) ClientConnectionId:00000000-0000-0000-0000-000000000000 Error Number:5,State:0,Class:20

The issue is: The problem goes away by itself after a few hours OR, apparently, if I restart the API (we had this issue during the day once and it fixed it after I restarted the App).

I believe it may be something related to our Connection Pool OR some network/firewall issue.

Here's a link of a guy with a similar issue:
https://serverfault.com/questions/1022414/azure-web-app-suddenly-stops-communicating-with-external-sql-server?newreg=f5d5e5722a43486e92f6efa49d8096e6

I appreciate any help. I can also provide metrics if needed. CPU and memory were stable during the issue.

Azure App Services
Azure App Services
A feature of Azure App Service used to create and deploy scalable, mission-critical web apps.
4,326 questions
SQL Server Reporting Services
SQL Server Reporting Services
A SQL Server technology that supports the creation, management, and delivery of both traditional, paper-oriented reports and interactive, web-based reports.
2,126 questions
{count} votes

2 answers

Sort by: Most helpful
  1. ajkuma 13,476 Reputation points Microsoft Employee
    2021-05-05T08:49:29.52+00:00

    @Prado, Michael , Thanks for posting a detailed description of the issue. Apologies for any inconvenience with this.

    As you have rightly narrowed-down the issue, it could be due to network/firewall or Connection Pool- I’m not sure on the App Service Plan and instances size/count you’re using. There are limits on outbound connections based on instance size- the likely cause of the issue at 2 AM with high load could be due to this:

    Kindly review the Outbound connections limits here: https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox#cross-vm-numerical-limits (Cross VM Numerical Limits/ Number of named pipes)

    I understand you have reviewed CPU and memory; you may test this out with Scale to a larger size/count.
    On the App Service, in the left navigation, click on Diagnose and solve problems – Checkout the tile for “Diagnostic Tools” > “Availability and Performance” /to isolate the issue more.

    93855-image.png


  2. Prado, Michael 1 Reputation point
    2021-05-10T21:22:38.347+00:00

    I think I've made some progress here. With your help, I found this:

    95378-image.png

    The two top items are our database (1433) and an external API service we call.

    So apparently our plan (S1) is limited to 30 outbound connections at a time. This is really low, I believe and I couldn't find this same number in the link you provided. Anything I'm missing?

    Until I find a way to increase this limit, I'm working on two different fronts...

    1 - I have two singleton HttpClient objects (per instance) and I'm limiting the number of connections in the pool to 5 (per domain).
    2 - I am also configuring our DbConnection string to limit the Max Pool Size to 10

    This will probably have a performance impact but I will be able to check, over the next few days, if we are still reaching the TCP limit at night.