Share via

Azure PostgreSQL Flexible Server major version upgrade stuck - no upgrade activity in logs after deployment timeout

Olivier Neu 21 Reputation points
2025-10-03T19:09:05.84+00:00

Problem Summary

I'm attempting to perform a major version upgrade on an Azure PostgreSQL Flexible Server from version 13.22 to version 14. The upgrade operation times out after 2+ hours with no actual upgrade activity occurring on the database server.

Environment

  • Service: Azure Database for PostgreSQL - Flexible Server
  • Current Version: PostgreSQL 13.22
  • Target Version: PostgreSQL 14
  • Server Status: Ready (before upgrade attempt)
  • Resource Group: takecare-cacn-sto-rg
  • Server Name: migration-test
  • Subscription ID: xxxxxxxxxxxxxxxxxxxxxxxxxxx
  • Region: Canada Central

Steps to Reproduce

  1. Navigate to the PostgreSQL Flexible Server in Azure Portal
  2. Server is in "Ready" state
  3. Click "Upgrade" and select target version 14
  4. Portal redirects to "Deployment is in progress" page
  5. Wait for 2+ hours

Actual Result

  • Deployment page shows "in progress" indefinitely
  • After timeout period, receive deployment error:
    
      {
    
        "code": "DeploymentFailed",
    
        "message": "At least one resource deployment operation failed...",
    
        "details": [{
    
          "code": "OperationTimedOut",
    
          "message": "The operation did not complete within the permitted time"
    
        }]
    
      }
    
    
  • PostgreSQL server remains operational and on version 13.22
  • Server never enters stopped/maintenance state

PostgreSQL Logs Analysis

I enabled "PostgreSQL server logs and upgrade logs" in Diagnostic Settings. Analysis of 22+ hours of PostgreSQL logs shows:

  • ✅ Server continues normal operation throughout the "upgrade" period
  • ✅ Regular connections and checkpoints executing normally
  • ✅ LSN (Log Sequence Number) progressing as expected
  • No shutdown messages (Azure should stop the server for offline upgrade)
  • No pg_upgrade activity or messages
  • No major version upgrade logs whatsoever

Critical Finding: During server restarts, logs show recurring error:

This suggests a missing or corrupted tablespace directory that may be blocking Azure's pre-upgrade validation.

What I've Tried

  1. Retry with Azure CLI in detached mode:
       az postgres flexible-server upgrade --resource-group takecare-cacn-sto-rg \
         --name migration-test --version 14 --no-wait
    
    Result: Same behavior - deployment shows in progress but no actual upgrade activity
  2. Checked server state:
       az postgres flexible-server show -g takecare-cacn-sto-rg -n migration-test
    
    Result: Server remains in "Ready" state, version still 13.22
  3. Reviewed all available logs: Only PostgreSQL operational logs visible, no Azure Control Plane or pg_upgrade logs appear

Questions

  1. Why is the upgrade operation not being initiated by Azure Control Plane? The deployment shows "in progress" but PostgreSQL logs prove no upgrade process ever starts.
  2. Could the missing tablespace directory pg_tblspc/16386/PG_13_202007201 be blocking the upgrade? How can I validate or repair this tablespace before attempting upgrade?
  3. Where can I find Azure Control Plane logs or validation errors? PostgreSQL logs only show normal operation - I need to see why Azure is not starting the upgrade process.
  4. Is there a way to force pre-upgrade validation checks? This would help identify what's blocking the upgrade before waiting 2 hours for timeout.
  5. Are there any known issues with PostgreSQL 13.22 to 14 upgrades on Azure Flexible Server that could cause this behavior?

Expected Behavior

According to Azure documentation:

  • Upgrade is an offline operation (server should stop)
  • Should complete in under 15 minutes for most databases
  • pg_upgrade logs should appear in diagnostic logs

None of these behaviors are occurring - the upgrade operation never starts at all.

Additional Context

  • This is a migration test server (non-production)
  • Database size is moderate
  • Server has been running stably on version 13.22
  • No other Azure operations are running on this resource
  • Sufficient quota and permissions confirmed in subscription

Any guidance on how to diagnose why Azure is not initiating the upgrade process would be greatly appreciated.

Azure Database for PostgreSQL

3 answers

Sort by: Most helpful
  1. Olivier Neu 21 Reputation points
    2025-10-17T13:11:03.45+00:00

    Hello,

    Thanks for following up. I can confirm that the situation has improved:

    • The upgrade job now returns the error “The major version upgrade failed precheck” almost immediately, which helped me pinpoint the root cause—two extensions, timescaledb and dblink, were blocking the process.
    • I removed dblink, which we no longer use, and uninstalled timescaledb temporarily. After the upgrade, I reinstalled timescaledb without issues.
    • I reran the upgrade yesterday, and it completed successfully: the server moved from PostgreSQL 13 to PostgreSQL 16.
    • Regarding PostgreSQL 17, I’m holding off for now because there is still a known bug with timescaledb on that version. Staying on 16 is fine for us at the moment.

    Thank you for your assistance; it helped us get a quick diagnosis and ultimately resolve the upgrade.

    Was this answer helpful?


  2. Olivier Neu 21 Reputation points
    2025-10-06T14:03:46.1966667+00:00

    Yes, I can confirm the server state details:

    Current State: The server is in "Ready" state (which corresponds to "Succeeded" provisioning state in Azure Resource Manager).

    Problem Description:

    The server status remains stuck in "Ready" state throughout the entire upgrade attempt. Specifically:

    1. Before upgrade: Server state = "Ready" ✓
    2. During upgrade attempt: Server state = "Ready" (no change)
    3. After timeout error: Server state = "Ready" (still no change)

    Expected Behavior:

    According to Azure documentation, during a major version upgrade the server should:

    • Transition to a maintenance/updating state
    • Stop accepting connections (offline operation)
    • Complete the upgrade
    • Return to "Ready" state

    Actual Behavior:

    • Server never leaves "Ready" state
    • Server continues accepting connections normally
    • No maintenance window occurs
    • After 2+ hours, deployment fails with "OperationTimedOut" error
    • Server remains operational on PostgreSQL 13.22

    Evidence from PostgreSQL logs (22+ hours analyzed):

    • Server shows continuous normal operation
    • No shutdown messages
    • No pg_upgrade process logs
    • No state transitions whatsoever

    Question: Could you please check the Azure Control Plane logs for this server to see why the upgrade operation is not being initiated? The deployment shows "in progress" in the portal, but the actual database server never receives the upgrade command.

    Server details for your investigation:

    • Server Name: migration-test
    • Resource Group: takecare-cacn-sto-rg
    • Subscription: xxxxxxxxxxxxxxxxxxxx
    • Region: Canada Central

    Additionally, I noticed this error in PostgreSQL logs during restarts:

    Could this missing tablespace directory be preventing the Azure pre-upgrade validation from passing?

    Was this answer helpful?

    0 comments No comments

  3. Mahesh Kurva 10,520 Reputation points Microsoft External Staff Moderator
    2025-10-06T13:07:54.7233333+00:00

    Hi Olivier Neu ,

    Azure PostgreSQL Flexible Server major version upgrade stuck - no upgrade activity in logs after deployment timeout

    After checking with internal team resolved the issue.

    I hope this has been helpful!

    Your feedback is important so please take a moment to accept answers. If you still have questions, please let us know what is needed in the comments so the question can be answered. Thank you for helping to improve Microsoft Q&A!

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.