Share via

How to achieve high performance for postgres in AKS

Greg Toews 20 Reputation points
2026-05-08T18:04:11.2566667+00:00

We cannot get the performance we need with managed premiumV2 SSDs however using locally attached NVMe disks gets us were we need to be (> 10k TPS as measured with pgbench). The problem is NVMe are ephemeral. Can ultra disks get us the performance we need? I think the cost is prohibitive for 6-8TB. I'm also thinking of using a VM or even a dedicated vm with local NVMe storage and ensuring the machine is never recycled. Looking to understand under what circumstances would a VM be recycled such that we'd lose the contents of our NVMe storage? Would disabling updates be enough?

Azure Storage
Azure Storage

Globally unique resources that provide access to data management services and serve as the parent namespace for the services.


Answer accepted by question author

Manish Deshpande 6,830 Reputation points Microsoft External Staff Moderator
2026-05-08T18:59:26.48+00:00

Hello @Greg Toews

These issues often occur when scaling PostgreSQL past 10k TPS on AKS, and focusing on NVMe is the right direction.

1.Can NVMe get us the performance we need without Ultra Disk's cost?

Yes, this is the recommended approach for your workload. According to Microsoft’s benchmarks, PostgreSQL can reach nearly 15,000 TPS with single-digit millisecond latency on a Standard_L16s_v3 VM using local NVMe and Azure Container Storage, which exceeds your 10,000 TPS requirement.

 

It is also important to consider cost efficiency: an 8-vCPU Standard_L8s_v3 VM with local NVMe achieves approximately 400,000 IOPS, whereas achieving similar IOPS with Ultra Disk would require a 112-vCPU VM. For a dataset of 6–8 TB, the Lsv3 series offers a much more cost-effective solution.

 

While Ultra Disks can deliver high performance, they generally come at a significantly higher cost and may not match the latency provided by NVMe.

Pls refer the link : https://azure.microsoft.com/en-us/blog/running-high-performance-postgresql-on-azure-kubernetes-service/

2.How do we deal with NVMe being ephemeral?

User's image

This is a crucial design consideration. Rather than focusing on making NVMe durable, it is advisable to design your application for resilience.

The recommended solution is to deploy PostgreSQL with the CloudNativePG (CNPG) operator, which offers the following benefits:

• Native PostgreSQL streaming replication across three nodes (one primary and two replicas) distributed across multiple Availability Zones

• Automated failover with an RTO of less than 10 seconds and an RPO of zero

• Continuous WAL archiving to Azure Blob Storage, providing a reliable backup layer

PostgreSQL's WAL mechanism records every transaction before it is applied, ensuring that replicas maintain a consistent and current copy, even if a node fails. Azure Container Storage natively manages NVMe volume orchestration within Kubernetes, eliminating the need for manual RAID configuration.

3.When exactly would a VM be recycled, and would disabling updates prevent it?

Node image upgrades (weekly Linux patches) → node is reimaged

• Kubernetes version upgrades → nodes are recreated

• Node failure or hardware fault → data lost

• VM deallocation (scale-in event) → data lost

  • VM redeploy
  • node upgrade
  • scale-in

AKS does weekly node image updates and recommends using auto-upgrade channels, so recycling is totally expected it’s all about keeping things secure and reliable.

 

Turning off updates isn’t the way to go since that could leave your cluster open to OS security issues and isn’t supported long-term. Instead, it’s best to take advantage of CloudNativePG’s high availability setup: when a node gets recycled, CNPG will automatically create a new replica on a fresh node and sync it up with the primary. Your data stays safe—you’ll just have one less replica for a bit while it catches up.

Reference Links:
1.AKS Engineering Blog: PostgreSQL + NVMe deep dive
https://blog.aks.azure.com/2025/07/09/postgresql-nvme

2.Best practices for ephemeral NVMe data disks on AKS
https://learn.microsoft.com/en-us/azure/aks/best-practices-storage-nvme

3.Deploy highly available PostgreSQL on AKS
https://learn.microsoft.com/en-us/azure/aks/deploy-postgresql-ha?tabs=azuredisk

4.PostgreSQL HA overview with CloudNativePG

https://learn.microsoft.com/en-us/azure/aks/postgresql-ha-overview

If you have any questions please feel free to comment and we will be happy to assist you.

Thanks,,
Manish.

Was this answer helpful?

1 person found this answer helpful.
0 comments No comments

1 additional answer

Sort by: Most helpful
  1. kagiyama yutaka 3,430 Reputation points
    2026-05-22T21:10:22.2466667+00:00

    Azure NVMe is temporary and lost on any recycle, so PostgreSQL durability must rely on managed disks or replication + WAL; Ultra Disk scales IOPS but not NVMe latency, and TPS depends on workload tests.

    Was this answer helpful?

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.