Share via

Azure PostgreSQL Flexible Server PITR restore stuck - port 5432 refusing connections after 7+ hours

paradigmshift 0 Reputation points
2026-03-11T19:21:12.5933333+00:00

Background

Production database had data accidentally deleted via bulk operation

at 2026-03-11 05:23 UTC. Attempting PITR to recover deleted data.

Server Specs

  • Region: Korea Central
  • SKU: Standard_E8ds_v5, 2TB storage, PostgreSQL 16
  • Restore target time: 2026-03-11T05:17:00Z

Attempt History (all failed)

Attempt 1 - Portal

  • Method: Azure Portal PITR
  • Started: ~2026-03-11T06:00Z
  • Result: Stuck in "Running" state for 4+ hours
  • ARM deployment provisioningState: Running (never changed)
  • Port 5432: Connection refused throughout
  • Server visible in portal: NO
  • Root cause found: network parameter was empty {} in ARM template
  • Action: Cancelled and deleted after 4 hours

Attempt 2 - Portal

  • Method: Azure Portal PITR (re-attempted)
  • Started: 2026-03-11T10:37:43Z
  • az postgres flexible-server show returned state: Ready after ~30 min
  • ARM deployment provisioningState: Running (stayed Running)
  • Port 5432: Connection refused throughout
  • Server visible in portal: NO
  • network.publicNetworkAccess: Enabled (confirmed via CLI)
  • Firewall rule AllowAll added: still connection refused
  • Action: Cancelled after 73 minutes

Attempt 3 - CLI (CURRENT)

  • Method: az postgres flexible-server restore
  • Command: az postgres flexible-server restore \ --resource-group <rg> \ --name <restored-server> \ --source-server <source-server> \ --restore-time "2026-03-11T05:17:00Z" \ --zone 1 \ --no-wait
  • Started: 2026-03-11T11:53:00Z
  • az postgres flexible-server show state: Ready (within ~5 minutes)
  • Port 5432: Connection refused for 7+ hours
  • Server visible in portal: NO
  • Firewall rule AllowAll (0.0.0.0-255.255.255.255): configured
  • Activity Log: Shows "List/Get PostgreSQL server" Running repeatedly, no completion event

Consistent Symptoms Across All Attempts

  1. CLI returns state: Ready but port 5432 never opens
  2. Server never appears in Azure Portal server list
  3. No progress indicator available anywhere
  4. Activity log shows only repeated polling, no WAL replay completion event

Questions

  1. Is WAL replay actually progressing for attempt 3?
  2. Is 7+ hours expected for a 2TB database with ~21 hours of WAL to replay?
  3. Why does CLI show Ready while port 5432 is still refusing connections?
  4. Is there a way to monitor actual WAL replay progress?
  5. Could there be a backend provisioning lock on our subscription/region?
Azure Database for PostgreSQL

3 answers

Sort by: Most helpful
  1. Saraswathi Devadula 15,940 Reputation points Microsoft External Staff Moderator
    2026-03-19T02:29:29.5066667+00:00

    Hi paradigmshift ,

    When a server is deleted, database server backups may be retained within the service for up to 5 days.

    We ask that you follow the following documents to restore your server, https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/how-to-restore-dropped-server

    pgaudit extension provides detailed audit logging which is not enabled by default.

    Server logs capture (PostgreSQL standard logs) must be enabled explicitly to access logs for troubleshooting, monitoring, and audit-level details.

    Azure monitoring and PostgreSQL logging features are available independently of pgaudit but require server logs capture and diagnostic settings to be enabled.

    Thus, without enabling pgaudit and server logs capture, you cannot find the logs to check the server deletion details.

    https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/security-audit?tabs=portal

    0 comments No comments

  2. Pilladi Padma Sai Manisha 7,305 Reputation points Microsoft External Staff Moderator
    2026-03-11T20:27:35.86+00:00

    Hi paradigmshift ,
    Thank you for reaching out regarding the PITR restore on Azure Database for PostgreSQL Flexible Server.

    During Point-in-Time Restore, Azure first provisions the server and then performs WAL (Write-Ahead Log) replay to reach the selected restore timestamp. For large databases (~2 TB) with many hours of WAL, this recovery phase can take several hours. During this stage:

    The CLI may show state = Ready because infrastructure provisioning has completed.

    Port 5432 can still refuse connections while the database engine finishes recovery.

    WAL replay progress is not currently exposed in the portal, CLI, or activity logs.

    Once the WAL replay completes, the server becomes fully available and port 5432 will start accepting connections automatically.

    0 comments No comments

  3. Q&A Assist
    2026-03-11T19:21:30.7333333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more
    1. WAL replay progress for attempt 3 The available information does not expose any internal WAL replay status for Azure Database for PostgreSQL flexible server. The documented behavior for PITR is:
    • A new server is created in the same region.
    • Snapshot backup is restored.
    • WAL files are replayed until the requested point in time. This entire process is treated as a single restore operation that completes before the server becomes usable. There is no documented API or portal surface that shows intermediate WAL replay progress or confirms that replay is actively progressing.
    1. Expected duration for 2 TB + ~21 hours of WAL The documentation states that restore time depends on:
    • The size of the backup to restore.
    • The volume of transaction logs to recover. For availability-zone or region recovery, it explicitly notes that “The time taken to restore depends on the previous backup and the volume of transaction logs to recover.” For large databases and long WAL windows, multi‑hour restores are expected, but there is no documented SLA or maximum duration. The context does not provide a numeric expectation for 2 TB with ~21 hours of WAL, so only the general statement that restore time scales with data size and WAL volume is supported.
    1. Why state: Ready while port 5432 refuses connections For PITR on flexible server:
    • A restore operation “always creates a new database server with the name that you provide. It doesn't overwrite the existing database server.”
    • The portal/CLI “Create Azure Database for PostgreSQL flexible server – Restore server” deployment completes the server resource creation and then restores data. The context does not describe a split state model where state: Ready is guaranteed to mean that PostgreSQL is already accepting connections. It only states that a new server is deployed and then the backup and WAL are applied. Therefore, state: Ready can reflect ARM resource creation success, while the underlying data restore/WAL recovery is still in progress and the engine is not yet accepting connections on port 5432. The documentation does not provide a more granular state for “restoring/WAL replay in progress” at the server level.
    1. Monitoring WAL replay progress The provided materials do not document any way to:
    • View WAL replay percentage.
    • See a “restore in progress” status beyond the initial deployment.
    • Query a restore job or operation ID for PITR on flexible server. For PITR, the guidance is limited to:
    • Initiate restore via Portal/CLI/API.
    • Wait for the new server deployment to complete.
    • Then connect to the new server. There is no documented method to monitor internal WAL replay progress.
    1. Possible backend provisioning lock on subscription/region The context does not mention any concept of a backend provisioning lock at the subscription or region level for Azure Database for PostgreSQL flexible server, nor any diagnostics for such a condition. It only notes that:
    • PITR always creates a new server.
    • Restore time depends on backup size and WAL volume.
    • For region failures, geo‑restore or read replicas can be used, with RPO expectations but no mention of provisioning locks. No further conclusions about subscription/region‑level locks are supported by the provided documentation.

    Given the constraints of the documentation:

    • The behavior observed (multi‑hour restore, resource appearing as created while not yet usable, no WAL progress visibility) is consistent with the general description that restore time depends on data size and WAL volume and that PITR is an opaque operation that results in a new server once complete.
    • There is no supported mechanism in the context to confirm or refute an internal lock or to directly inspect WAL replay progress.

    References:

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.