How to achieve high performance for postgres in AKS

Question

How to achieve high performance for postgres in AKS

Greg Toews 20

We cannot get the performance we need with managed premiumV2 SSDs however using locally attached NVMe disks gets us were we need to be (> 10k TPS as measured with pgbench). The problem is NVMe are ephemeral. Can ultra disks get us the performance we need? I think the cost is prohibitive for 6-8TB. I'm also thinking of using a VM or even a dedicated vm with local NVMe storage and ensuring the machine is never recycled. Looking to understand under what circumstances would a VM be recycled such that we'd lose the contents of our NVMe storage? Would disabling updates be enough?

Greg Toews 20 Reputation points

2026-05-11T15:37:22.3733333+00:00

Thanks for the thoughtful answers. You mentioned the 8-vCPU Standard_L8s_v3 being suitable for an 6-8TB datasets but I believe that VM would only be provisioned with one 1.9TB NVMe disk. I believe I'd need to provision a larger machine such as the L32s_v3 for 11.53TB NVMe storage. Is my understanding correct?
Manish Deshpande 6,830 Reputation points Microsoft External Staff Moderator

2026-05-22T18:54:05.7033333+00:00
Hi @Greg Toews ,

Apologies for the late response

You're spot on and thanks for the follow-up, it's a great clarification worth spelling out clearly.

The Lsv3 series allocates one 1.92 TB NVMe SSD device per 8 vCPUs, so the NVMe capacity scales linearly with VM size. Here's how that breaks down across the sizes relevant to your use case:

So for a 6–8 TB dataset, the Standard_L32s_v3 (4 × 1.92 TB ≈ 7.68 TB) is indeed the right fit — not the L8s_v3, which only gives you a single 1.92 TB disk. My earlier mention of the L8s_v3 was in the context of the VM family in general; for your actual storage requirement, the L32s_v3 is the correct starting point.

If your dataset is closer to or exceeds 8 TB, you'd want to step up to the Standard_L48s_v3, which gives you ~11.52 TB across 6 NVMe disks — matching what you noted as the figure you'd need.

A couple of things worth keeping in mind as you plan this out:

Local NVMe disks on Lsv3 VMs are ephemeral — data is lost if the VM is stopped or deallocated. For PostgreSQL, this reinforces the need for a replication-based durability strategy (e.g., CloudNativePG with Azure Blob Storage backups), rather than relying on the disk itself for persistence.

NVMe disk encryption is enabled by default using hardware-based encryption with a platform-managed key for Lsv3 VMs created or allocated on or after 1/1/2023

Reference links :

https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/storage-optimized/lsv3-series?tabs=sizebasic
https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/storage-optimized/l-family

If you have any questions please feel free to comment in the comment section i will be happy to assist you.

Thanks,
Manish.
Manish Deshpande 6,830 Reputation points Microsoft External Staff Moderator

2026-05-26T02:07:58.92+00:00

Hello @Greg Toews

I wanted to check if my last response made sense. I’d be glad to assist further or explain anything in more detail and please accept as Yes and upvote if the answer is helpful so that it can help others in the community.
Greg Toews 20 Reputation points

2026-05-26T16:15:52.5366667+00:00

@Manish Deshpande how to accept your answer?
Manish Deshpande 6,830 Reputation points Microsoft External Staff Moderator

2026-05-26T18:50:57.5866667+00:00

Hello @Greg Toews

I have posted my Answer and you will be able to see the accept response option under my response.

Answer accepted by question author

Manish Deshpande 6,830 Microsoft External Staff Moderator

Hello @Greg Toews

These issues often occur when scaling PostgreSQL past 10k TPS on AKS, and focusing on NVMe is the right direction.

1.Can NVMe get us the performance we need without Ultra Disk's cost?

Yes, this is the recommended approach for your workload. According to Microsoft’s benchmarks, PostgreSQL can reach nearly 15,000 TPS with single-digit millisecond latency on a Standard_L16s_v3 VM using local NVMe and Azure Container Storage, which exceeds your 10,000 TPS requirement.

It is also important to consider cost efficiency: an 8-vCPU Standard_L8s_v3 VM with local NVMe achieves approximately 400,000 IOPS, whereas achieving similar IOPS with Ultra Disk would require a 112-vCPU VM. For a dataset of 6–8 TB, the Lsv3 series offers a much more cost-effective solution.

While Ultra Disks can deliver high performance, they generally come at a significantly higher cost and may not match the latency provided by NVMe.

Pls refer the link : https://azure.microsoft.com/en-us/blog/running-high-performance-postgresql-on-azure-kubernetes-service/

2.How do we deal with NVMe being ephemeral?

User's image

This is a crucial design consideration. Rather than focusing on making NVMe durable, it is advisable to design your application for resilience.

The recommended solution is to deploy PostgreSQL with the CloudNativePG (CNPG) operator, which offers the following benefits:

• Native PostgreSQL streaming replication across three nodes (one primary and two replicas) distributed across multiple Availability Zones

• Automated failover with an RTO of less than 10 seconds and an RPO of zero

• Continuous WAL archiving to Azure Blob Storage, providing a reliable backup layer

PostgreSQL's WAL mechanism records every transaction before it is applied, ensuring that replicas maintain a consistent and current copy, even if a node fails. Azure Container Storage natively manages NVMe volume orchestration within Kubernetes, eliminating the need for manual RAID configuration.

3.When exactly would a VM be recycled, and would disabling updates prevent it?

Node image upgrades (weekly Linux patches) → node is reimaged

• Kubernetes version upgrades → nodes are recreated

• Node failure or hardware fault → data lost

• VM deallocation (scale-in event) → data lost

VM redeploy
node upgrade
scale-in

AKS does weekly node image updates and recommends using auto-upgrade channels, so recycling is totally expected it’s all about keeping things secure and reliable.

Turning off updates isn’t the way to go since that could leave your cluster open to OS security issues and isn’t supported long-term. Instead, it’s best to take advantage of CloudNativePG’s high availability setup: when a node gets recycled, CNPG will automatically create a new replica on a fresh node and sync it up with the primary. Your data stays safe—you’ll just have one less replica for a bit while it catches up.

Reference Links:
1.AKS Engineering Blog: PostgreSQL + NVMe deep dive
https://blog.aks.azure.com/2025/07/09/postgresql-nvme

2.Best practices for ephemeral NVMe data disks on AKS
https://learn.microsoft.com/en-us/azure/aks/best-practices-storage-nvme

3.Deploy highly available PostgreSQL on AKS
https://learn.microsoft.com/en-us/azure/aks/deploy-postgresql-ha?tabs=azuredisk

4.PostgreSQL HA overview with CloudNativePG

https://learn.microsoft.com/en-us/azure/aks/postgresql-ha-overview

If you have any questions please feel free to comment and we will be happy to assist you.

Thanks,,
Manish.

0 comments

1 additional answer

Your answer

Greg Toews 20 Reputation points

2026-05-11T15:37:22.3733333+00:00

Thanks for the thoughtful answers. You mentioned the 8-vCPU Standard_L8s_v3 being suitable for an 6-8TB datasets but I believe that VM would only be provisioned with one 1.9TB NVMe disk. I believe I'd need to provision a larger machine such as the L32s_v3 for 11.53TB NVMe storage. Is my understanding correct?
Manish Deshpande 6,830 Reputation points Microsoft External Staff Moderator

2026-05-22T18:54:05.7033333+00:00

Hi @Greg Toews ,

Apologies for the late response

You're spot on and thanks for the follow-up, it's a great clarification worth spelling out clearly.

The Lsv3 series allocates one 1.92 TB NVMe SSD device per 8 vCPUs, so the NVMe capacity scales linearly with VM size. Here's how that breaks down across the sizes relevant to your use case:

So for a 6–8 TB dataset, the Standard_L32s_v3 (4 × 1.92 TB ≈ 7.68 TB) is indeed the right fit — not the L8s_v3, which only gives you a single 1.92 TB disk. My earlier mention of the L8s_v3 was in the context of the VM family in general; for your actual storage requirement, the L32s_v3 is the correct starting point.

If your dataset is closer to or exceeds 8 TB, you'd want to step up to the Standard_L48s_v3, which gives you ~11.52 TB across 6 NVMe disks — matching what you noted as the figure you'd need.

A couple of things worth keeping in mind as you plan this out:

Local NVMe disks on Lsv3 VMs are ephemeral — data is lost if the VM is stopped or deallocated. For PostgreSQL, this reinforces the need for a replication-based durability strategy (e.g., CloudNativePG with Azure Blob Storage backups), rather than relying on the disk itself for persistence.

NVMe disk encryption is enabled by default using hardware-based encryption with a platform-managed key for Lsv3 VMs created or allocated on or after 1/1/2023

Reference links :

https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/storage-optimized/lsv3-series?tabs=sizebasic
https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/storage-optimized/l-family

If you have any questions please feel free to comment in the comment section i will be happy to assist you.

Thanks,
Manish.
Manish Deshpande 6,830 Reputation points Microsoft External Staff Moderator

2026-05-26T02:07:58.92+00:00

Hello @Greg Toews

I wanted to check if my last response made sense. I’d be glad to assist further or explain anything in more detail and please accept as Yes and upvote if the answer is helpful so that it can help others in the community.
Greg Toews 20 Reputation points

2026-05-26T16:15:52.5366667+00:00

@Manish Deshpande how to accept your answer?
Manish Deshpande 6,830 Reputation points Microsoft External Staff Moderator

2026-05-26T18:50:57.5866667+00:00

Hello @Greg Toews

I have posted my Answer and you will be able to see the accept response option under my response.

Answer 1

kagiyama yutaka 3,430

Azure NVMe is temporary and lost on any recycle, so PostgreSQL durability must rely on managed disks or replication + WAL; Ultra Disk scales IOPS but not NVMe latency, and TPS depends on workload tests.

0 comments

Share via

How to achieve high performance for postgres in AKS

1 additional answer

Your answer