SQL Server test databases fail when multiple databases open

Paul Tobey 1 Reputation point
2022-03-22T20:53:30.793+00:00

I'm working on an intermittent issue experienced with:

  • Entity Framework Core 3.1, C#
  • SQL Server 2019
  • Unit tests (xUnit)

Tests:
* About 2000 tests total
* About 150 tests which exercise system elements using SQL Server database, each test creating a new database using a GUID-based database name for uniqueness. When a given test starts, it creates the database it needs and then runs some operations against it, deleting the database when complete
* Test failures appear random. Total failure set is between 1-10 of 150 and the same tests are not reliably failing, nor is any single test immune

Failure:
The failure is always the same, although the name of the database (of course) varies, and the test triggering the failure is different each time as noted above. The failure appears to occur either during creation or migration of the database. The attempt to open/create the database seems to fail after trying multiple times (as shown in the SQL Server log further below).

[xUnit.net 00:05:35.37]       System.InvalidOperationException : An exception has been raised that is likely due to a transient failure. Consider enabling transient error resiliency by adding 'EnableRetryOnFailure()' to the 'UseSqlServer' call.
[xUnit.net 00:05:35.37]       ---- Microsoft.Data.SqlClient.SqlException : A connection was successfully established with the server, but then an error occurred during the login process. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.)

The SQL Server log (SQL Server Express 2019 in this case), shows this each time (other I/O for simultaneously-running tests removed):

03/22/2022 13:27:50,Logon,Unknown,Login failed for user 'DOMAIN\ptobey'. Reason: Failed to open the explicitly specified database 'zFlowLogsRepositoryTest_Sqld00bb232661c400991e4397fda214c1e'. [CLIENT: <local machine>]
03/22/2022 13:27:50,Logon,Unknown,Error: 18456<c/> Severity: 14<c/> State: 38.
03/22/2022 13:27:49,Logon,Unknown,Login failed for user 'DOMAIN\ptobey'. Reason: Failed to open the explicitly specified database 'zSensorRepositoryTest_Sql7d2bacd79f3c4a57b02f4f4aedc9aea9'. [CLIENT: <local machine>]
03/22/2022 13:27:49,Logon,Unknown,Error: 18456<c/> Severity: 14<c/> State: 38.
03/22/2022 13:27:49,Logon,Unknown,Login failed for user 'DOMAIN\ptobey'. Reason: Failed to open the explicitly specified database 'zSensorRepositoryTest_Sql7d2bacd79f3c4a57b02f4f4aedc9aea9'. [CLIENT: <local machine>]
03/22/2022 13:27:49,Logon,Unknown,Error: 18456<c/> Severity: 14<c/> State: 38.
03/22/2022 13:27:47,spid52,Unknown,Parallel redo is shutdown for database 'zFlowLogsRepositoryTest_Sqlec352b5994904f9bb2cb7f563ad1f26b' with worker pool size [4].
03/22/2022 13:27:47,spid52,Unknown,Parallel redo is started for database 'zFlowLogsRepositoryTest_Sqlec352b5994904f9bb2cb7f563ad1f26b' with worker pool size [4].
03/22/2022 13:27:47,spid52,Unknown,Starting up database 'zFlowLogsRepositoryTest_Sqlec352b5994904f9bb2cb7f563ad1f26b'.
03/22/2022 13:27:46,spid65,Unknown,Parallel redo is shutdown for database 'zSatelliteRepositoryTests_Delete267dd7a8e81240f39de88da975d152f2' with worker pool size [4].
03/22/2022 13:27:46,spid65,Unknown,Parallel redo is started for database 'zSatelliteRepositoryTests_Delete267dd7a8e81240f39de88da975d152f2' with worker pool size [4].
03/22/2022 13:27:45,spid65,Unknown,Starting up database 'zSatelliteRepositoryTests_Delete267dd7a8e81240f39de88da975d152f2'.
...
03/22/2022 13:27:45,Logon,Unknown,Login failed for user 'DOMAIN\ptobey'. Reason: Failed to open the explicitly specified database 'zFlowElementRepositoryTest_Sql8999d59d52534d65a85054dac805af14'. [CLIENT: <local machine>]
03/22/2022 13:27:45,Logon,Unknown,Error: 18456<c/> Severity: 14<c/> State: 38.

So Error: 18456, Severity: 14, State: 38 basically seems to indicate a login failure. However, the same user is opening all of the databases (all 150) and only 1-10 are failing in this way, so it's not a real login issue. As you can see Windows identity is being used, if that helps identify the issue. There can be anywhere from 4-5 to 30-40 databases open at the same time, depending on the state of the corresponding tests and which tests are active. xUnit is managing when tests run so it might run a bunch of non-SQL tests, then mix in a few SQL tests, then mostly SQL tests (as the faster non-SQL tests complete).

I've limited which pieces of which tests can be running at the same time without impact on the result; I've locked access to certain parts of the code so that only one or only a few simultaneous operations can occur without impact on the result. The only reliable improvement is running only one database test at a time, generally completing all tests successfully. I'm looking at possible issues with the domain controller identity checking but this seems a stretch. Help!

Entity Framework Core
Entity Framework Core
A lightweight, extensible, open-source, and cross-platform version of the Entity Framework data access technology.
696 questions
SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,742 questions
{count} votes

7 answers

Sort by: Most helpful
  1. Mark Radcliffe 0 Reputation points
    2023-08-29T22:48:51.49+00:00

    I have this same problem with EFCore 7,

    It occurs randomly and breaks the process of unit testing, if I run a 1000 tests, with a parallelism of 10 meaning 10 tests can run at any one time, each test will create it's own database from scratch do the test, then delete the database.

    It'll get through a random number of tests then just fail, leaving the unfinished set of tests in the Not Run state. I then have to run the "Not Run" tests multiple times till it can finally get through all of the tests.

    The SQL Express server is local so there should be no transient error for networking since it is using named pipes.

    This is quite problematic for running automated builds that rely on the tests completing.

    MSTest Executor: Test Parallelization enabled for D:\~~~~~.dll (Workers: 30, Scope: ClassLevel).
    The active test run was aborted. Reason: Test host process crashed : Unhandled exception. System.InvalidOperationException: An exception has been raised that is likely due to a transient failure. Consider enabling transient error resiliency by adding 'EnableRetryOnFailure' to the 'UseSqlServer' call.
     ---> Microsoft.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when sending the request to the server. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.)
     ---> System.ComponentModel.Win32Exception (233): No process is on the other end of the pipe.
       at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
       at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
       at Microsoft.Data.SqlClient.TdsParserStateObject.SNIWritePacket(PacketHandle packet, UInt32& sniError, Boolean canAccumulate, Boolean callerHasConnectionLock, Boolean asyncClose)
       at Microsoft.Data.SqlClient.TdsParserStateObject.WriteSni(Boolean canAccumulate)
       at Microsoft.Data.SqlClient.TdsParserStateObject.WritePacket(Byte flushMode, Boolean canAccumulate)
       at Microsoft.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
       at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction2005(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
       at Microsoft.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName, Boolean shouldReconnect)
       at Microsoft.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso, String transactionName)
       at Microsoft.Data.SqlClient.SqlConnection.BeginDbTransaction(IsolationLevel isolationLevel)
       at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.BeginTransaction(IsolationLevel isolationLevel)
       at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.Execute(IEnumerable`1 commandBatches, IRelationalConnection connection)
       at Microsoft.EntityFrameworkCore.Storage.RelationalDatabase.SaveChanges(IList`1 entries)
       at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChanges(IList`1 entriesToSave)
       at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChanges(StateManager stateManager, Boolean acceptAllChangesOnSuccess)
       at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.<>c.<SaveChanges>b__107_0(DbContext _, ValueTuple`2 t)
       at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded)
    

  2. Bruce (SqlWork.com) 56,531 Reputation points
    2023-08-30T15:41:58.09+00:00

    Your database seems to be in recovery mode when your test tries to connect. The first step after creating the database, should poll and check the database status. Calling sp sys.sp_readerrorlog and looking for startup status is the typical approach. Google for examples.