MSSQLSERVER_35250
Applies to: SQL Server
Details
Attribute | Value |
---|---|
Product Name | SQL Server |
Event ID | 35250 |
Event Source | MSSQLSERVER |
Component | SQLEngine |
Symbolic Name | HADR_PRIMARYNOTACTIVE |
Message Text | The connection to the primary replica is not active. The command cannot be processed. |
Explanation
This message occurs when attempting to join secondary databases to an Always On availability group. Inability to connect to the endpoint can typically cause this error.
User Action
Option 1: Execute the steps directly in a notebook via Azure Data Studio
Learn how to install Azure Data Studio
Option 2: Follow the step manually**
Note
All the following steps must be run on both the Primary replica and the problematic Secondary replica(s).
1. Ensure the endpoint is created and started.
Run the following query to discover the endpoint
SELECT tep.name as EndPointName, sp.name As CreatedBy, tep.type_desc, tep.state_desc, tep.port FROM sys.tcp_endpoints tep INNER JOIN sys.server_principals sp ON tep.principal_id = sp.principal_id WHERE tep.type = 4
Warning
Use caution when executing the next command as it can cause a momentary downtime for the replica.
You can use these commands to restart the endpoint you discovered
ALTER ENDPOINT hadr_endpoint STATE = STOPPED ALTER ENDPOINT hadr_endpoint STATE = STARTED
2. Check if you can connect to the endpoint
Use telnet or Test-NetConnection to validate connectivity. If the Endpoint is listening and connection is successful, then telnet will show you a blank screen with a blinking cursor. If not, you will receive a connection error from telnet. To exit a successful telnet connection, press CTRL+]. If you use Test-NetConnection look for the
TcpTestSucceeded : True
orTcpTestSucceeded : False
.telnet ServerName <port_number> telnet IP_Address <port_number>
Test-NetConnection -ComputerName <ServerName> -Port <port_number> Test-NetConnection -ComputerName <IP_address> -Port <port_number>
DNS issues:
- If telnet/Test-NetConnection succeeds to the IP address but fails to the ServerName, there is likely a DNS or name resolution issue. See Check for name resolution issues
Multiple processes listening on the same port
If telnet/Test-NetConnection connection works using ServerName but fails using IP address, then there could be more than one endpoint defined on that server (another SQL instance perhaps) that is configured to listen on that port. Though the status of the endpoint on the instance in question shows "STARTED" another instance may actually have the port binding and prevent the correct instance from listening and establishing TCP connections. To find the owning process of port 5022 for example, run this command:
$port = "5022" Get-Process -Id (Get-NetTCPConnection -LocalPort $port).OwningProcess |Select-Object Name, ProductVersion, Path, Id
Blocked endpoint (firewall, anti-virus)
If telnet or Test-NetConnection fails to connect, look for Firewall and/or Anti-virus software that may be blocking the endpoint port in question. Check the firewall setting to see if it allows the endpoint port communication between the server instances that host primary replica and the secondary replica (port 5022 by default). If you are running SQL Server on Azure VM, additionally you would need to ensure Network Security Group (NSG) allows the traffic to endpoint port. Check the firewall (and NSG, for Azure VM) setting to see if it allows the endpoint port communication between the server instances that host primary replica and the secondary replica (port 5022 by default)
Run the following PowerShell script to check for disabled inbound traffic rules
Get-NetFirewallRule -Action Block -Enabled True -Direction Inbound |Format-Table
Capture a netstat or Get-NetTCPConnection output and verify the status is a LISTENING or ESTABLISHED on the IP:Port for the endpoint specified
netstat -a
Get-NetTCPConnection -LocalPort <port_number>
You can also find the port-owning process: run a command like this (e.g. using port 5022)
$port = "5022" Get-Process -Id (Get-NetTCPConnection -LocalPort $port).OwningProcess |Select-Object Name, ProductVersion, Path, Id
3. Check for errors in the system
You can query the sys.dm_hadr_availability_replica_states for the last_connect_error_number that may help you diagnose the join issue. Depending on which replica was having difficulty communicating, you can query both the primary and secondary:
select r.replica_server_name, r.endpoint_url, rs.connected_state_desc, rs.last_connect_error_description, rs.last_connect_error_number, rs.last_connect_error_timestamp from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r on rs.replica_id = r.replica_id where rs.is_local = 1
If the secondary was unable to communicate with the DNS server, for example, or if a replica's endpoint_url was configured incorrectly when creating the availability group, you may get the following results in the last_connect_error_description:
DNS Lookup failed with error '11001(No such host is known)'
4. Ensure the endpoint is configured for the correct IP/port that AG is defined for
Run the following query on the Primary and then each Secondary replica that is failing to connect. This will help you find the endpoint URL and port
select endpoint_url from sys.availability_replicas
Run the following query to find the endpoints and ports
SELECT tep.name as EndPointName, sp.name As CreatedBy, tep.type_desc, tep.state_desc, tep.port FROM sys.tcp_endpoints tep INNER JOIN sys.server_principals sp ON tep.principal_id = sp.principal_id WHERE tep.type = 4
Compare endpoint_url and port from each query and ensure the port from the endpoint_url matches the port defined for the endpoint on each respective replica
Note
If you are using specific IP addresses for the endpoint to listen on, versus the default of "listen all", then you may have to define URLs that use the specific IP address rather than the FQDN.
5. Check whether the network service account has CONNECT permission to the endpoint
Run the following queries to list the accounts that have connect permission to the endpoint on the server(s) in question, and to show the permission assigned to each relevant endpoint.
SELECT perm.class_desc, prin.name, perm.permission_name, perm.state_desc, prin.type_desc as PrincipalType, prin.is_disabled FROM sys.server_permissions perm LEFT JOIN sys.server_principals prin ON perm.grantee_principal_id = prin.principal_id LEFT JOIN sys.tcp_endpoints tep ON perm.major_id = tep.endpoint_id WHERE perm.class_desc = 'ENDPOINT' AND perm.permission_name = 'CONNECT' AND tep.type = 4; SELECT ep.name, sp.state, CONVERT(nvarchar(38), suser_name(sp.grantor_principal_id)) AS grantor, sp.TYPE AS permission, CONVERT(nvarchar(46),suser_name(sp.grantee_principal_id)) AS grantee FROM sys.server_permissions SP INNER JOIN sys.endpoints ep ON sp.major_id = ep.endpoint_id AND EP.type = 4 ORDER BY Permission,grantor, grantee;
6. Check for name resolution issues
Validate DNS resolution by using nslookup or Resolve-DnsName on the IP address and the name:
nslookup <IP_Address> nslookup <ServerName>
Resolve-DnsName -Name <ServerName> Resolve-DnsName -Name <IP_address>
Does the name resolve to the correct IP address? Does the IP address resolve to the correct name?
Check for local HOSTS file entries on each node that may be pointing to an incorrect server. From Command Prompt print the HOSTS file using this:
type C:\WINDOWS\system32\drivers\etc\hosts
Get-Content 'C:\WINDOWS\system32\drivers\etc\hosts'
Check if there are Server Aliases for Use by a Client defined on the replicas
7. Ensure your SQL Server is running a recent build (preferably the latest build)
- Update SQL Server versions to protect from running into issues like KB3213703.
For more information, refer to Create Availability Group Fails With Error 35250 'Failed to join the database'