Both our development and staging environments are completely down. All applications dependent on these databases are non-functional. No deployments or testing work is possible until this is resolved.
AFFECTED RESOURCES
Two Azure Database for PostgreSQL Flexible Server instances, in the same subscription but separate resource groups, exhibiting the same symptom simultaneously.
Both servers were started this morning as part of our normal start-of-day routine. Connection failures have existed since startup.
CLIENT-FACING SYMPTOM
All client connections rejected with:
FATAL: pg_hba.conf rejects connection for host "<ip>", user "<admin>", database "<db>", SSL encryption
Rejection occurs for:
- Connections from our office IP (allowlisted)
- Connections from AKS pods in the same subscription
- Connections from Azure Cloud Shell (i.e., this is not an IP-list issue)
CONFIGURATION (verified correct on both servers)
The config of both servers are identical. We did not change anything in the config what might have raised this issue.
The PostgreSQL admin user used in the connection attempts has not changed and was working prior to today.
CONTROL PLANE BEHAVIOR
Server reports state: Ready, but the control plane is non-responsive to management operations:
- Stop operations: rejected instantly with "SeverBusyWithOtherOperation"
- Restart operations: rejected instantly with same error
- Disable public access: accepted but hung at "Accepted" status for 60+ minutes without progressing
- Firewall rule create/delete operations: do complete successfully, but have no effect on the rejection behavior — strongly suggesting the control plane is no longer syncing pg_hba.conf to the data plane
DIAGNOSTICS PERFORMED
- Verified no resource locks (az resource lock list returns [])
- Verified no read replicas, no HA configuration
- Verified Service Health: no acknowledged incidents in the region
- Verified Resource Health on both servers
- Verified that no metrics are raising issues (CPU, memory, storage, ...)
- Confirmed both servers have identical configuration
- Confirmed the failure is not IP-specific (Cloud Shell also rejected)
KNOWN ISSUE MATCH
The symptom matches Microsoft Q&A case 5732231 ("B1ms Server Stuck:
FATAL pg_hba.conf rejects connection despite active Firewall Rules"),
diagnosed in that case as a control plane → data plane pg_hba.conf
sync failure. The recommended workaround in that case (toggle public
network access) was attempted on our dev server and is the operation
that is now hung at for 60+ minutes.
No idea how to further debug this - since both servers reject any interaction and seem to be completely hung up.