Ekinlikler
31 Mar 23 - 2 Nis 23
En büyük SQL, Fabric ve Power BI öğrenme etkinliği. 31 Mart – 2 Nisan. 400 ABD doları tasarruf etmek için FABINSIDER kodunu kullanın.
Bugün kaydolunBu tarayıcı artık desteklenmiyor.
En son özelliklerden, güvenlik güncelleştirmelerinden ve teknik destekten faydalanmak için Microsoft Edge’e yükseltin.
Applies to:
SQL Server
Attribute | Value |
---|---|
Product Name | SQL Server |
Event ID | 35267 |
Event Source | MSSQLSERVER |
Component | SQLEngine |
Symbolic Name | HADR_DISCONNECTED_DB |
Message Text | Always On Availability Groups connection with %S_MSG database terminated for %S_MSG database '%.*ls' on the availability replica '%.*ls' with Replica ID: {%.8x-%.4x-%.4x-%.2x%.2x-%.2x%.2x%.2x%.2x%.2x%.2x}. This is an informational message only. No user action is required. |
This message occurs when an availability group replica loses its connection to the remote replicas on the database mirroring endpoint. Here are examples of how you can see this error:
Always On Availability Groups connection with secondary database terminated for primary database 'ContosoDb' on the availability replica 'PRODSQL' with Replica ID: {xxxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxx}. This is an informational message only. No user action is required.
Always On Availability Groups connection with primary database terminated for secondary database 'ContosoDb' on the availability replica 'PRODSQL' with Replica ID: {xxxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxx}. This is an informational message only. No user action is required.
As you can see the error can appear on the Primary replica indicating that it lost communication with the Secondary replica, or vice versa.
Error 35267 is typically intermittent and might resolve itself the moment the underlying cause resolves itself. For example, an intermittent network issue might resolve itself and the connection might re-establish itself.
In many cases, the remote node to which the local node is trying to connect might not even be aware of the connection failure. Therefore, you might only see this error raised on one of the replicas, not both.
Error 35267 can sometimes occur together with error 35206, which is raised when a significant period has elapsed without a successful connection (for example, more than 10 seconds).
A connection timeout has occurred on a previously established connection to availability replica 'PRODSQL' with id [xxxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxx]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
Always On Availability Groups connection with primary database terminated for secondary database 'ContosoHRDb' on the availability replica 'PRODSQL' with Replica ID: {xxxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxx}. This is an informational message only. No user action is required.
Always On Availability Groups connection with primary database terminated for secondary database 'ContosoFinDb' on the availability replica 'PRODSQL' with Replica ID: {xxxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxx}. This is an informational message only. No user action is required.
Always On Availability Groups connection with primary database terminated for secondary database 'ContosoMktngDb' on the availability replica 'PRODSQL' with Replica ID: {xxxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxx}. This is an informational message only. No user action is required.
The AG connection termination with the remote replica can lead to various issues local replica. For example, if the AG uses SYNCHRONOUS mode and the connection is lost, the local replica might end up waiting for confirmation from the remote. As a result, transaction log isn't truncated and the transaction log to run out of space (error MSSQLSERVER_9002) and later to become unavailable (error MSSQLSERVER_9001). Here's an example of group of errors where this occurred. The reason for transaction log being full is 'AVAILABILITY_REPLICA', which means this replica is waiting for the remote one to acknowledge it's applied log records.
Error: 9002, Severity: 17, State: 9.
The transaction log for database 'ContosoAnalyticsDb' is full due to 'AVAILABILITY_REPLICA'.
Error: 3314, Severity: 21, State: 3.
During undoing of a logged operation in database 'ContosoAnalyticsDb' (page (1:32573799) if any), an error occurred at log record ID (7672713:36228:159). Typically, the specific failure is logged previously as an error in the operating system error log. Restore the database or file from a backup, or repair the database.
State information for database 'ContosoAnalyticsDb' - Hardened Lsn: '(7672713:38265:1)' Commit LSN: '(7672712:1683087:46)' Commit Time: 'JuN 10 2022 5:51AM'
Always On Availability Groups connection with secondary database terminated for primary database 'ContosoAnalyticsDb' on the availability replica 'SQL2019DB' with Replica ID: {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}. This is an informational message only. No user action is required.
Database ContosoAnalyticsDb was shutdown due to error 3314 in routine 'XdesRMReadWrite::RollbackToLsn'. Restart for non-snapshot databases will be attempted after all connections to the database are aborted.
Error during rollback. shutting down database (location: 1).
Error: 9001, Severity: 21, State: 5.
The log for database 'ContosoAnalyticsDb' is not available. Check the operating system error log for related error messages. Resolve any errors and restart the database.
Recovery of database 'ContosoAnalyticsDb' (6) is 0% complete (approximately 60177 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
The information below outlines the more common scenarios but isn't an exhaustive list of troubleshooting steps. The specific reasons for the occurrence of this problem can include a long list of possibilities.
To check for connection issues from the SQL Server where the error is raised to the remote SQL Server, you can consider the following steps:
Run the following query to discover the endpoint
SELECT
tep.name as EndPointName,
sp.name As CreatedBy,
tep.type_desc,
tep.state_desc,
tep.port
FROM
sys.tcp_endpoints tep
INNER JOIN sys.server_principals sp ON tep.principal_id = sp.principal_id
WHERE tep.type = 4
Use Test-NetConnection to validate connectivity. If the Endpoint is listening and connection is successful, look for the TcpTestSucceeded : True
. Replace ServerName or IP_Address with remote SQL Server and the port number with that of the database mirroring endpoint.
Test-NetConnection -ComputerName <ServerName> -Port <port_number>
Test-NetConnection -ComputerName <IP_address> -Port <port_number>
Intermittent network errors are often difficult to track down unless you capture a network trace, which shows network resets (dropped packets) or similar issues. For more information, see 0300 Intermittent or Periodic Network Issue
If the SQL Server worker threads are running into scheduler problems for various reasons, then the threads that service incoming requests can stop responding temporarily while the scheduler issues last.
A typical non-yielding scheduler issue is recorded in SQL Server error log after 70 seconds of non-yield state. However, SQL Server checks the state of schedulers more frequently than that and reports those intermediate non-yielding states in Extended events. If you uncover scheduler issues on the remote node that correspond to the time of error 35267, focus on resolving those first. Here's how you can check for short-lived occurrences of scheduler issues that don't reach the 70-second threshold, but occur for say 10 or 20 seconds.
Use the System Health extended event file
system_health_0_xxxxxxxxxxxxxxxxxx.xel
to open it in SQL Server Management Studio (SSMS). Alternatively, you can use sys.fn_xe_file_target_read_file
to view or import the file as a table for easier filtering.Use the Diagnostics Log
Locate the Diagnostics Log in the \Log directory from the time of the event (applicable to Windows Cluster systems). The file name format is like this SERVERNAME_MSSQLSERVER_SQLDIAG_x_xxxxxxxxxxxxxxxxxx.xel
.
Double-click to open the file in SQL Server Management Studio (SSMS). Alternatively, you can use sys.fn_xe_file_target_read_file
to view or import the file as a table for easier filtering.
Once opened in SSMS, locate an instance of component_health_result event and right-click on the following and choose Show Column in Table: component, state_desc
Then right-click on each column and choose Filter by this value to apply the following filters:
Then double-click on the data column to open the XML data and look trackingNonYieldingScheduler
value in the first row.
If the value is different from 0x0
that means SQL Server has detected early signs of a non-yielding scheduler and reporting it here.
Here's an example where SQL Server has detected a non-yielding condition with a scheduler address "0x4fedb840040":
<queryProcessing maxWorkers="9600" workersCreated="2574" workersIdle="1883" tasksCompletedWithinInterval="175591" pendingTasks="3" ... trackingNonYieldingScheduler="0x4fedb840040">
There could be various issues at the operating system (OS) level that trigger such intermittent lack of response. A common one is low memory. On the remote AG node where the suspected issue is occurring, do the following steps:
Check the Windows System event log for any errors indicating low physical or virtual memory.
Check for error 17890 in the SQL Server error log or the Windows Application event log to see if low memory on the machine is leading to Working set trimming of all processes on the system including SQL Server. The error looks like this:
A significant part of SQL Server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 3383250, committed (KB): 9112480, memory utilization: 37%.
For detailed t-shooting steps, see MSSQLSERVER_17890
In some cases excessively slow I/O can cause the SQL Server threads to stop responding temporarily, which might cause the other AG replica to disconnect.
If you encounter errors that indicate slow I/O, troubleshoot the underlying reasons for slow I/O.
SQL Server has encountered 2 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\TLOG\ContosoDb.ldf] in database id 9. The OS file handle is 0x00000000000003BC. The offset of the latest long I/O is: 0x0000003d26f600
SQL Server has encountered 2 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\DATA\t38data\ContosoDb2.mdf] in database id 7. The OS file handle is 0x000000000000118C. The offset of the latest long I/O is: 0x00000000012000
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\DATA\t38data\ContosoDb.mdf] in database id 9. The OS file handle is 0x000000000000134C. The offset of the latest long I/O is: 0x00000000012000
Always On Availability Groups connection with primary database terminated for secondary database 'ContosoDb2' on the availability replica 'SQLNODE1\INSTANCE19' with Replica ID: {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}. This is an informational message only. No user action is required.
Always On Availability Groups connection with primary database terminated for secondary database 'ContosoDb' on the availability replica 'SQLNODE1\INSTANCE19' with Replica ID: {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}. This is an informational message only. No user action is required.
For detailed instructions, see Troubleshoot slow SQL Server performance caused by I/O issues and MSSQLSERVER_833.
Ekinlikler
31 Mar 23 - 2 Nis 23
En büyük SQL, Fabric ve Power BI öğrenme etkinliği. 31 Mart – 2 Nisan. 400 ABD doları tasarruf etmek için FABINSIDER kodunu kullanın.
Bugün kaydolun