How to achieve zero-downtime scale up and down PostgreSQL flexible server

Question

How to achieve zero-downtime scale up and down PostgreSQL flexible server

Ruppel, Julian 70

When scaling compute, https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-business-continuity#planned-downtime-events states

During compute scaling operation, active checkpoints are allowed to complete, client connections are drained, any uncommitted transactions are canceled, storage is detached, and then it's shut down. A new flexible server with the same database server name is provisioned with the scaled compute configuration. The storage is then attached to the new server and the database is started which performs recovery if necessary before accepting client connections. ... When the flexible server is configured with high availability, the flexible server performs the scaling and the maintenance operations on the standby server first. For more information, see Concepts - High availability.

and https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-high-availability

For other user initiated operations such as scale-compute or scale-storage, the changes are applied at the standby first, followed by the primary. Currently, the service is not failed over to the standby and hence while the scale operation is carried out on the primary server, applications will encounter a short downtime.

Is there a way to avoid the downtime?

For managed maintenance it seems to be possible using HA fail-over

For flexible servers configured with high availability, these maintenance activities are performed on the standby replica first and the service is failed over to the standby to which applications can reconnect.

Read replicas would be a natural fit, but the docs https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-read-replicas give no hints about a (automated) process where the scaling is conducted without downtime using the replicas.

Accepted answer

0 additional answers

Your answer

Answer 1

GeethaThatipatri-MSFT 29,552 Microsoft Employee Moderator

@Ruppel, Julian Welcome to Microsoft Q&A thanks for posting your question.

In Azure postgres flexible server, you can scale compute and storage (together or independently).

Compute – when you scale compute tier, the server is restarted for the new server type to take effect. Learn more
Storage – Storage in most cases can be scaled online without any server restart. There are a few exceptions to this when storage scale does require restart. Learn more

Also recently, we have also introduced new feature called ‘Near Zero Downtime’ that can reduce the restart start to less than 30 sec and improves the overall availability of your flexible server workloads.

Regards

Geetha

Ruppel, Julian 70 Reputation points

2023-12-22T08:05:23.33+00:00

Hi @GeethaThatipatri-MSFT thanks for your reply.

I think your link is broken. Did you mean https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-scaling-resources#near-zero-downtime-scaling ?

Anyway, yes, it pointed me to the feature I needed, thank you!!
Ruppel, Julian 70 Reputation points

2023-12-22T10:02:32.7033333+00:00

Maybe another comment.

The docs state max 30sec disruption, but how is the concrete distribution? min 5sec, mean 15sec, p95 25sec?

And what is the main influence factor of this duration? Does it take longer for big compute or storage sizes as for small ones?

Share via

How to achieve zero-downtime scale up and down PostgreSQL flexible server

0 additional answers

Your answer