Hi Sivangnanam, Kuraliniyan (AWM, IRL),
Thank you for reaching on Microsoft Q&A! Welcome
Here are some possible causes and suggestions to address the high PostgreSQL replay lag:
- Check Replication slot status:
First, use the following command to check the status of your replication slots:
SELECT * FROM pg_replication_slots;
This will help identify any inactive replication slots that might be causing issues.
2.Monitor workloads
High replication lag can often result from an increased write workload on your primary server. If your primary server is experiencing heavy write activity:
- Consider reducing this load if possible.
- Alternatively, you may want to scale your server up to a higher SKU with more resources (vCores and Memory).
3.Scale up: If you confirm high load, scaling up your primary server may help improve performance. You can check the scaling options in the Azure portal here.
4.Try Re-creating Replica: If the lag persists after handling workloads, you can delete and recreate the read replica after the write-heavy workload is completed. This will reset it to a good state regarding lag.
Consider setting up alerts for replication lag and storage usage. For instance, set alerts if replication lag exceeds 5 minutes or if storage usage exceeds 80%. This way, you can proactively manage the issue in future scenarios.
- Ensure that the parameters
max_prepared_transactions,max_locks_per_transaction, andmax_worker_processesare set appropriately on both the primary and replica instances. - If conflicts during recovery are a concern, enabling the
hot_standby_feedbackparameter on the replica can help prevent cancelation of long-running queries due to replication conflicts.
Please go through the following document for more details:
I hope this helps, please let us know if you have further queries on this.
Thanks!
Kalyani