PostegreSQL Flexible Server stuck in Updating state

Tino Merl 0 Reputation points
2023-08-17T14:49:15.01+00:00

We have a PostegrSQL Flexible Server which is stuck in Update State for about 7 hours now. I tried restarting and stopping it via az cli and it didn't work. The Updating state was initiated by a runbook to scale down the Database after a heavy load of data processing.

Azure Database for PostgreSQL
{count} votes

2 answers

Sort by: Most helpful
  1. Rahul Randive 8,521 Reputation points Microsoft Employee
    2023-08-17T22:27:30.88+00:00

    Hi @Tino Merl

    As you mentioned, updating state was initiated by a runbook to scale down the Database after a heavy load of data processing.

    In those case, Recovery time depends on how recent the last checkpoint was and the amount of data inside those log files, that said, the best practice is that application developer needs to avoid log running transactions and tune checkpoint frequency to avoid long recovery.

    What causes long recovery on Azure Database for PostgreSQL
    Recent checkpoints are critical for fast server recovery. Once a restart happens, either it was a new instance (failover to healthy instance) or same instance (in-place restart) will connect to disk that has all logs, all WAL logs after the last successful checkpoint need to be applied to the data pages before the server starts to accept connections again. Those logs are called REDO logs and will be applied via the recovery operation.

    While performing upscale or descale, please Ensure no long running transactions happening on the server and Stop or reduce the application intensity workload.

    If your server is still stuck in updating state, as Oury mentioned, please open a support ticket and share the case number so we can get this issue fixed at the back end.

    Thank you!


  2. Maryna Bohdan 0 Reputation points
    2024-03-05T10:19:34.02+00:00

    Hello, Oury Ba-MSFT!

    I am currently facing the issue, that was discussed on this page.

    There was no load while automated scaling (I am using automation tasks(preview) to scale our postgresql up and down), but the error happened during scaling up, then task was rolled back and since then server is in updating state.

    Here is the proof of low load at the time of scaling:
    Screenshot 2024-03-05 105923

    Here is the error text:

    Screenshot 2024-03-05 111319

    Could you please fix through backend, when there is an error or 100% db resource involvement, task is rolled back and no logs produced anymore, so server goes back immediately to previous settings and does not stuck in updating state?

    Thank you in advance,

    please let me know if I can help with more information,

    Maryna