# SQL Server 2022 Availability Group works in Workgroup but fails in Active Directory environment Hello everyone, I am currently learning SQL Server administration and I am doing a practical lab as part of my internship. I would appreciate some guidance

Question

# SQL Server 2022 Availability Group works in Workgroup but fails in Active Directory environment Hello everyone, I am currently learning SQL Server administration and I am doing a practical lab as part of my internship. I would appreciate some guidance

Eurly Garcia LOUP LOCKO 0

Hello everyone,

This is my first time posting on this forum, so please excuse me if I miss any forum conventions or provide too much or too little information.

I am currently learning SQL Server administration and I am doing a practical lab as part of my internship. I would appreciate some guidance regarding an issue I am facing with SQL Server 2022 Availability Groups.

Lab Context

I have built two separate labs to compare the behavior of SQL Server 2022 Availability Groups in different environments.

Lab 1 - Workgroup Environment

Two Windows Server 2022 servers.
One Primary replica.
One Secondary replica.
WSFC cluster created.
Availability Group configured.

Lab 2 - Active Directory Environment

Two Windows Server 2022 servers joined to an existing Active Directory domain.
One Primary replica.
One Secondary replica.
WSFC cluster created.
Availability Group configured.

In both labs, the primary server hosts the same SQL Server 2022 database.

Actions Performed

The following steps were successfully completed in both environments:

Full backup of the database.
Transaction Log backup.
Restore of the database on the secondary replica using NORECOVERY.

The secondary database is currently in the RESTORING state, which is required before joining it to an Availability Group.

Results

Workgroup Lab

Everything works correctly:

WSFC cluster created successfully.
Availability Group created successfully.
Secondary database joined the AG without issues.
Data synchronization started successfully.

Active Directory Lab

The situation is different.

The following components appear to be working correctly:

DNS resolution.
Network communication.
WSFC cluster.
Availability Group creation.
Communication between replicas.
Cluster Name Object (CNO) exists in Active Directory.
Required permissions appear to be configured.
Availability Group Listener was created successfully and is visible in Active Directory Users and Computers (ADUC).

However, when I try to join the secondary database to the Availability Group, the operation fails.

The database remains in the RESTORING state and does not start synchronization.

My Question

What I find surprising is that the exact same procedure works perfectly in the Workgroup environment but fails in the Active Directory environment.

Since the following components seem to be functioning properly:

Network
DNS
WSFC Cluster
Availability Group
Listener
Replica communication
Backup and restore process

What Active Directory-related factors could prevent the secondary database from joining the Availability Group?

I would also like to better understand how the SQL Server HADR (High Availability and Disaster Recovery) layer works in an Active Directory environment.

For example:

What role do Kerberos authentication and SPNs play in HADR communication?
Could SQL Server service account permissions be responsible for this behavior?
What troubleshooting steps would you recommend when the Availability Group is created successfully, but the secondary database fails to join and synchronize?

Thank you in advance for your help and recommendations.

0 comments

1 answer

Your answer

Answer 1

In an Active Directory–joined Always On Availability Group, the main additional moving parts compared to a workgroup configuration are:

Authentication model (Kerberos vs certificates)
Service accounts and endpoint permissions
Name resolution and SPNs
Cluster/AG metadata stored in AD

Any misconfiguration in these areas can cause the AG to be created successfully but block the secondary database from joining and synchronizing.

Below are the key AD-related factors and concrete troubleshooting steps, based on the documented behavior of AGs and their mirroring-style endpoints.

1. Endpoint configuration and connectivity

The join operation uses the HADR endpoint (DATABASE_MIRRORING endpoint) between replicas. If the secondary cannot establish an active connection to the primary’s endpoint, joining the database fails with errors like 35250.

From the documented troubleshooting steps for Msg 35250 The connection to the primary replica is not active. The command cannot be processed.:

Ensure the endpoint is created and started on both replicas.
Verify connectivity to the endpoint port (for example with Telnet) and confirm that no firewall rules block it.
Confirm the endpoint definition matches the IP/port actually used by the AG.
Check for name resolution issues.

In an AD environment, endpoints typically use FQDNs and Kerberos; in a workgroup lab, certificates and direct IPs are often used. A mismatch between endpoint URL/IP and how replicas resolve each other’s names can cause the secondary to fail to connect.

Concrete checks:

On both replicas, query the endpoint definition and confirm:
- Same port on both sides
- STATE = STARTED
- LISTENER_IP and LISTENER_PORT are reachable from the partner
Use telnet <partnerName> <endpointPort> (or equivalent) from each node to the other.

2. Service accounts and CONNECT permission on the endpoint

In an AD environment, SQL Server services usually run under domain accounts and use Windows authentication to the endpoint. The documented requirements are:

If both instances run under the same domain account, security is simpler: the login exists automatically in both master databases.
If instances run under different accounts, each account must be created in master on the remote instance and granted CONNECT permission on the mirroring endpoint.
If using built-in or non-domain accounts, certificates must be used for endpoint authentication.

If the workgroup lab uses certificates correctly but the AD lab relies on domain accounts without proper endpoint permissions, the secondary will not be able to connect to the primary’s endpoint even though the AG object and cluster look healthy.

Concrete checks:

Confirm what accounts the SQL Server services run under on both nodes.
On each instance, in master, verify that the partner’s service account exists as a login and has CONNECT permission on the endpoint.
If using built-in or non-domain accounts in AD, ensure certificates are configured as in the workgroup scenario.

3. Name resolution, SPNs, and Kerberos

In AD, HADR communication typically uses Kerberos over TCP. Kerberos requires:

Correct DNS resolution of server names
Correct Service Principal Names (SPNs) for the SQL Server service accounts
No duplicate or missing SPNs

If Kerberos fails, connections may fall back to NTLM or fail, depending on policy. For the AG endpoint, this can manifest as the secondary being unable to establish the required session to the primary.

The documented guidance for name resolution includes:

Use nslookup <ServerName> and nslookup <IP_Address> on each node.
Use Resolve-DnsName -Name <ServerName> and Resolve-DnsName -Name <IP_Address>.
Check the local HOSTS file for incorrect entries.

Concrete checks:

Ensure that the names used in the endpoint URLs resolve to the correct IPs on both nodes.
Confirm there are no stale or conflicting entries in C:\Windows\System32\drivers\etc\hosts.
Verify SPNs for the SQL Server service accounts (for example using setspn or Kerberos tools) and correct any duplicates or missing entries.

4. Error details from DMVs and logs

The documented troubleshooting path for join failures recommends checking the dynamic management view sys.dm_hadr_availability_replica_states:

Query last_connect_error_number and related columns on the secondary to see the exact error code when it attempts to connect to the primary.
Use that error number to focus on the specific cause (network, authentication, endpoint, etc.).

Also review:

SQL Server error log on both replicas at the time of the join attempt
Windows Event Logs (FailoverClustering and System) for related errors

5. Version/build issues

The documentation explicitly recommends ensuring SQL Server is on a recent build to avoid known bugs that can affect AG configuration and join operations.

Concrete step:

Confirm both replicas run the same, up-to-date SQL Server 2022 build and apply the latest cumulative update if needed.

6. Why it can work in workgroup but fail in AD

In the workgroup lab, the configuration likely uses:

Certificates for endpoint authentication
Direct IPs or simple names in endpoints and hosts files

This avoids Kerberos, SPNs, and domain account complexities.

In the AD lab, the same high-level steps (backup/restore, AG creation) can succeed while the underlying HADR connection from secondary to primary fails due to:

Missing or incorrect endpoint CONNECT permissions for the domain service accounts
Name resolution differences (FQDN vs short name) causing SPN mismatches
Kerberos/SPN misconfiguration preventing successful authentication to the endpoint

The AG object and listener can be created by the cluster and SQL Server, but the database-level join requires a working, authenticated endpoint session, which is where AD-related issues surface.

Recommended troubleshooting sequence for the AD lab

Attempt to join the secondary database and note the exact error message/number.
On the secondary, query sys.dm_hadr_availability_replica_states for last_connect_error_number and related fields.
Verify endpoint configuration and state on both replicas (port, IP, STATE = STARTED).
Test network connectivity to the endpoint port between nodes.
Validate DNS and hosts file entries using nslookup, Resolve-DnsName, and type/Get-Content on the hosts file.
Confirm SQL Server service accounts and ensure each has CONNECT permission on the partner’s endpoint if using different accounts.
Verify SPNs and Kerberos configuration for the SQL Server service accounts.
Ensure both replicas are on a current SQL Server 2022 build.

These checks target the AD-specific layers (Kerberos, SPNs, domain accounts, name resolution) that do not exist or are handled differently in the workgroup lab and are the most likely reasons the secondary database cannot join and start synchronization.

References:

Share via

# SQL Server 2022 Availability Group works in Workgroup but fails in Active Directory environment Hello everyone, I am currently learning SQL Server administration and I am doing a practical lab as part of my internship. I would appreciate some guidance

Hello everyone,

Lab Context

Lab 1 - Workgroup Environment

Lab 2 - Active Directory Environment

Actions Performed

Results

Workgroup Lab

Active Directory Lab

My Question

1 answer

1. Endpoint configuration and connectivity

2. Service accounts and CONNECT permission on the endpoint

3. Name resolution, SPNs, and Kerberos

4. Error details from DMVs and logs

5. Version/build issues

6. Why it can work in workgroup but fail in AD

Recommended troubleshooting sequence for the AD lab

Your answer