Azure database for PostgreSQL Flexible Server - RPO

Maheswaran T 0 Reputation points
2024-03-12T16:20:43.8633333+00:00

Hi,

Propose to use Azure database for PostgreSQL Flexi Server (HA+ Read Replica) for our one of the business-critical application.

I have gone through the article.

https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-business-continuity

Can you please confirm whether Azure PostgreSQL Flexi Server (General Purpose + Read Replicas) supports RPO <5 mins?

Database Size: 500GB-750GB

User's image

Thanks

Azure Database for PostgreSQL
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Oury Ba-MSFT 16,241 Reputation points Microsoft Employee
    2024-03-12T19:35:48.4333333+00:00

    @Maheswaran T

    My understanding is that you have configured an azure PostgreSQL flexible server General purpose with high availability and asking whether recovery point Objective (RPO) be < 5 mins.

    If you have enabled HA with Zone-redundant then Flexible server is automatically failed over to the standby server within 60-120 seconds with zero data loss.

    Flexible servers configured with zone-redundant high availability provide a recovery point objective (RPO) of Zero (no data loss). The recovery time objective (RTO) is expected to be less than 120s in typical cases. However, depending on the activity in the primary database server at the time of the failover, the failover may take longer.

    Please read more here.

    User's image

    https://learn.microsoft.com/en-us/azure/mysql/flexible-server/concepts-business-continuity#unplanned-downtime-failure-scenarios-and-service-recovery

    Please comment below if you have additional questions or need more clarifications.

    Regards,

    Oury


  2. Alicja Kucharczyk 180 Reputation points Microsoft Employee
    2024-04-25T11:28:45.2166667+00:00

    Thank you for your query regarding the Recovery Point Objective (RPO) capabilities of Azure Database for PostgreSQL Flexible Server, specifically for a database size ranging between 500GB and 750GB.

    It's important to note that the database size itself is not the primary determinant of whether an RPO of less than 5 minutes can be achieved. The more critical factor is the database workload, particularly the volume of transaction logs (Write-Ahead Logging - WAL) generated. Since the replication in Azure PostgreSQL Flexible Server is based on native PostgreSQL physical replication, it involves transmitting these WAL files from the primary server to the read replicas.

    Key factors that impact this replication process include:

    1. Network Latency and Throughput: The time and bandwidth it takes to send data across the network can significantly influence replication lag.
    2. Volume of Data Transmission: The amount of WAL data that needs to be continuously replicated is directly tied to your database's write workload (DML operations). Higher workloads can result in larger volumes of data that need to be synchronized, potentially increasing the RPO.
    3. Primary Database Activity: High-frequency DML operations will generate more WAL, which can slow down the replication process and extend the RPO.

    Achieving an RPO of less than 5 minutes is highly dependent on operational characteristics and cannot be guaranteed without practical testing. To accurately measure the replication lag under typical and peak conditions, I recommend:

    • Monitoring through Azure: Use Azure's built-in monitoring tools to track replication performance. Detailed guidance can be found here.
    • Direct Database Monitoring: Since Azure metrics might involve some rounding, for a more precise measurement, check the replication lag using the pg_stat_replication view on your primary database. More information on this PostgreSQL feature is available here.

    We advise conducting a thorough testing phase, where these monitoring tools can be utilized to evaluate and potentially optimize the replication settings based on real-time data.

    0 comments No comments