Azure PostgreSQL Flexible Server PITR restore stuck - port 5432 refusing connections after 7+ hours

Question

Azure PostgreSQL Flexible Server PITR restore stuck - port 5432 refusing connections after 7+ hours

paradigmshift 0

Background

Production database had data accidentally deleted via bulk operation

at 2026-03-11 05:23 UTC. Attempting PITR to recover deleted data.

Server Specs

Region: Korea Central
SKU: Standard_E8ds_v5, 2TB storage, PostgreSQL 16
Restore target time: 2026-03-11T05:17:00Z

Attempt History (all failed)

Attempt 1 - Portal

Method: Azure Portal PITR
Started: ~2026-03-11T06:00Z
Result: Stuck in "Running" state for 4+ hours
ARM deployment provisioningState: Running (never changed)
Port 5432: Connection refused throughout
Server visible in portal: NO
Root cause found: network parameter was empty {} in ARM template
Action: Cancelled and deleted after 4 hours

Attempt 2 - Portal

Method: Azure Portal PITR (re-attempted)
Started: 2026-03-11T10:37:43Z
az postgres flexible-server show returned state: Ready after ~30 min
ARM deployment provisioningState: Running (stayed Running)
Port 5432: Connection refused throughout
Server visible in portal: NO
network.publicNetworkAccess: Enabled (confirmed via CLI)
Firewall rule AllowAll added: still connection refused
Action: Cancelled after 73 minutes

Attempt 3 - CLI (CURRENT)

Method: az postgres flexible-server restore
Command: az postgres flexible-server restore \ --resource-group <rg> \ --name <restored-server> \ --source-server <source-server> \ --restore-time "2026-03-11T05:17:00Z" \ --zone 1 \ --no-wait
Started: 2026-03-11T11:53:00Z
az postgres flexible-server show state: Ready (within ~5 minutes)
Port 5432: Connection refused for 7+ hours
Server visible in portal: NO
Firewall rule AllowAll (0.0.0.0-255.255.255.255): configured
Activity Log: Shows "List/Get PostgreSQL server" Running repeatedly, no completion event

Consistent Symptoms Across All Attempts

CLI returns state: Ready but port 5432 never opens
Server never appears in Azure Portal server list
No progress indicator available anywhere
Activity log shows only repeated polling, no WAL replay completion event

Questions

Is WAL replay actually progressing for attempt 3?
Is 7+ hours expected for a 2TB database with ~21 hours of WAL to replay?
Why does CLI show Ready while port 5432 is still refusing connections?
Is there a way to monitor actual WAL replay progress?
Could there be a backend provisioning lock on our subscription/region?

Pilladi Padma Sai Manisha 7,305 Reputation points Microsoft External Staff Moderator

2026-03-13T20:13:24.18+00:00

Hi paradigmshift ,
I hope you had a chance to review the information shared earlier, and I hope this information has been helpful! If you still have questions, please let us know what is needed in the comments so the question can be answered.

3 answers

Your answer

Pilladi Padma Sai Manisha 7,305 Reputation points Microsoft External Staff Moderator

2026-03-13T20:13:24.18+00:00

Hi paradigmshift ,
I hope you had a chance to review the information shared earlier, and I hope this information has been helpful! If you still have questions, please let us know what is needed in the comments so the question can be answered.

Answer 1

Hi paradigmshift ,

When a server is deleted, database server backups may be retained within the service for up to 5 days.

We ask that you follow the following documents to restore your server, https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/how-to-restore-dropped-server

pgaudit extension provides detailed audit logging which is not enabled by default.

Server logs capture (PostgreSQL standard logs) must be enabled explicitly to access logs for troubleshooting, monitoring, and audit-level details.

Azure monitoring and PostgreSQL logging features are available independently of pgaudit but require server logs capture and diagnostic settings to be enabled.

Thus, without enabling pgaudit and server logs capture, you cannot find the logs to check the server deletion details.

https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/security-audit?tabs=portal

Answer 2

Hi paradigmshift ,
Thank you for reaching out regarding the PITR restore on Azure Database for PostgreSQL Flexible Server.

During Point-in-Time Restore, Azure first provisions the server and then performs WAL (Write-Ahead Log) replay to reach the selected restore timestamp. For large databases (~2 TB) with many hours of WAL, this recovery phase can take several hours. During this stage:

The CLI may show state = Ready because infrastructure provisioning has completed.

Port 5432 can still refuse connections while the database engine finishes recovery.

WAL replay progress is not currently exposed in the portal, CLI, or activity logs.

Once the WAL replay completes, the server becomes fully available and port 5432 will start accepting connections automatically.

Answer 3

WAL replay progress for attempt 3 The available information does not expose any internal WAL replay status for Azure Database for PostgreSQL flexible server. The documented behavior for PITR is:

A new server is created in the same region.
Snapshot backup is restored.
WAL files are replayed until the requested point in time. This entire process is treated as a single restore operation that completes before the server becomes usable. There is no documented API or portal surface that shows intermediate WAL replay progress or confirms that replay is actively progressing.

Expected duration for 2 TB + ~21 hours of WAL The documentation states that restore time depends on:

The size of the backup to restore.
The volume of transaction logs to recover. For availability-zone or region recovery, it explicitly notes that “The time taken to restore depends on the previous backup and the volume of transaction logs to recover.” For large databases and long WAL windows, multi‑hour restores are expected, but there is no documented SLA or maximum duration. The context does not provide a numeric expectation for 2 TB with ~21 hours of WAL, so only the general statement that restore time scales with data size and WAL volume is supported.

Why state: Ready while port 5432 refuses connections For PITR on flexible server:

A restore operation “always creates a new database server with the name that you provide. It doesn't overwrite the existing database server.”
The portal/CLI “Create Azure Database for PostgreSQL flexible server – Restore server” deployment completes the server resource creation and then restores data. The context does not describe a split state model where state: Ready is guaranteed to mean that PostgreSQL is already accepting connections. It only states that a new server is deployed and then the backup and WAL are applied. Therefore, state: Ready can reflect ARM resource creation success, while the underlying data restore/WAL recovery is still in progress and the engine is not yet accepting connections on port 5432. The documentation does not provide a more granular state for “restoring/WAL replay in progress” at the server level.

Monitoring WAL replay progress The provided materials do not document any way to:

View WAL replay percentage.
See a “restore in progress” status beyond the initial deployment.
Query a restore job or operation ID for PITR on flexible server. For PITR, the guidance is limited to:
Initiate restore via Portal/CLI/API.
Wait for the new server deployment to complete.
Then connect to the new server. There is no documented method to monitor internal WAL replay progress.

Possible backend provisioning lock on subscription/region The context does not mention any concept of a backend provisioning lock at the subscription or region level for Azure Database for PostgreSQL flexible server, nor any diagnostics for such a condition. It only notes that:

PITR always creates a new server.
Restore time depends on backup size and WAL volume.
For region failures, geo‑restore or read replicas can be used, with RPO expectations but no mention of provisioning locks. No further conclusions about subscription/region‑level locks are supported by the provided documentation.

Given the constraints of the documentation:

The behavior observed (multi‑hour restore, resource appearing as created while not yet usable, no WAL progress visibility) is consistent with the general description that restore time depends on data size and WAL volume and that PITR is an opaque operation that results in a new server once complete.
There is no supported mechanism in the context to confirm or refute an internal lock or to directly inspect WAL replay progress.

References:

Share via

Azure PostgreSQL Flexible Server PITR restore stuck - port 5432 refusing connections after 7+ hours

Background

Server Specs

Attempt History (all failed)

Attempt 1 - Portal

Attempt 2 - Portal

Attempt 3 - CLI (CURRENT)

Consistent Symptoms Across All Attempts

Questions

3 answers

Your answer