Replication lag in PostgreSQL Replay Lag is extremely high (236:41:55.093431

Question

Replication lag in PostgreSQL Replay Lag is extremely high (236:41:55.093431

User's image

Replica showing Active.One more repliucation slot showing delay PostgreSQL Replay Lag is extremely high (236:41:55.093431

Sivangnanam, Kuraliniyan (AWM, IRL) 20 Reputation points

2025-11-05T10:02:49.4366667+00:00

Anyone help on my question?
Kalyani Kondavaradala 4,595 Reputation points Microsoft External Staff Moderator

2025-11-07T12:51:14.0833333+00:00

Hi Sivangnanam, Kuraliniyan (AWM, IRL),

Just checking to see if you have a chance to check my previous response and helped, do let me know if you have any further questions on this.
Kalyani Kondavaradala 4,595 Reputation points Microsoft External Staff Moderator

2025-11-10T08:47:28.2566667+00:00

Hi Sivangnanam, Kuraliniyan (AWM, IRL),

we didn't hear any response from, do let us know if you have further queries on this. Happy to help you further.

Answer accepted by question author

2 additional answers

Your answer

Sivangnanam, Kuraliniyan (AWM, IRL) 20 Reputation points

2025-11-05T10:02:49.4366667+00:00

Anyone help on my question?
Kalyani Kondavaradala 4,595 Reputation points Microsoft External Staff Moderator

2025-11-07T12:51:14.0833333+00:00

Hi Sivangnanam, Kuraliniyan (AWM, IRL),

Just checking to see if you have a chance to check my previous response and helped, do let me know if you have any further questions on this.
Kalyani Kondavaradala 4,595 Reputation points Microsoft External Staff Moderator

2025-11-10T08:47:28.2566667+00:00

Hi Sivangnanam, Kuraliniyan (AWM, IRL),

we didn't hear any response from, do let us know if you have further queries on this. Happy to help you further.

Answer 1

Hi Sivangnanam, Kuraliniyan (AWM, IRL),

Thank you for reaching on Microsoft Q&A! Welcome

Here are some possible causes and suggestions to address the high PostgreSQL replay lag:

Check Replication slot status:

First, use the following command to check the status of your replication slots:

SELECT * FROM pg_replication_slots;

This will help identify any inactive replication slots that might be causing issues.

2.Monitor workloads

High replication lag can often result from an increased write workload on your primary server. If your primary server is experiencing heavy write activity:

Consider reducing this load if possible.
Alternatively, you may want to scale your server up to a higher SKU with more resources (vCores and Memory).

3.Scale up: If you confirm high load, scaling up your primary server may help improve performance. You can check the scaling options in the Azure portal here.

4.Try Re-creating Replica: If the lag persists after handling workloads, you can delete and recreate the read replica after the write-heavy workload is completed. This will reset it to a good state regarding lag.

Consider setting up alerts for replication lag and storage usage. For instance, set alerts if replication lag exceeds 5 minutes or if storage usage exceeds 80%. This way, you can proactively manage the issue in future scenarios.

Ensure that the parameters max_prepared_transactions, max_locks_per_transaction, and max_worker_processes are set appropriately on both the primary and replica instances.
If conflicts during recovery are a concern, enabling the hot_standby_feedback parameter on the replica can help prevent cancelation of long-running queries due to replication conflicts.

Please go through the following document for more details:

Monitor replication in PostgreSQL

I hope this helps, please let us know if you have further queries on this.

Thanks!

Kalyani

Answer 2

Sivangnanam, Kuraliniyan (AWM, IRL) 20

Replica slot showing Active but Replay Lag still high on primary

Kalyani Kondavaradala 4,595 Reputation points Microsoft External Staff Moderator

2025-11-11T10:00:45.0766667+00:00
Hi Sivangnanam, Kuraliniyan (AWM, IRL),

Thanks for the response, consider the following points

Check for High Workload on Primary Server: The replication lag is often caused by high write workloads on the primary server. If possible, try reducing the load temporarily to see if the lag decreases. You can also consider scaling up the primary server to handle the workload better.

Identify and Address Storage Usage: Check the storage used on your primary server. If it reaches over 95% or if less than 5 GiB is available, the primary may be switched to read-only mode. Ensure there’s adequate space to avoid this situation.

Consider Recreating the Replica: If the lag persists, especially due to a sustained write-heavy workload, consider deleting and recreating the read replica after the workload has stabilized. This will help bring it back to a good state.
Kalyani Kondavaradala 4,595 Reputation points Microsoft External Staff Moderator

2025-11-13T10:58:43.14+00:00

Hi Sivangnanam, Kuraliniyan (AWM, IRL),

Just checking in to see if you had a chance to check my response, let us know if you have further queries.

Answer 3

Hello Sivangnanam, Kuraliniyan (AWM, IRL),

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand you are having extreme PostgreSQL replay lag despite the replication slot showing active.

To confirm the lag source, start by inspecting pg_stat_replication on the primary to compare sent_lsn vs replay_lsn, and use pg_replication_slots to check if WAL is being retained excessively – https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-REPLICATION. On the standby, run SELECT now() - pg_last_xact_replay_timestamp() to measure actual clock delay. If replay lag is high but write/flush lag is low, the issue is likely on the standby’s apply path.

Long-running read queries on the standby can block WAL replay. Use pg_stat_activity to identify and terminate such queries, and check pg_stat_database_conflicts for lock or snapshot conflicts. Enabling hot_standby_feedback = on can reduce these conflicts, though it may increase bloat on the primary – https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-HOT-STANDBY-FEEDBACK. Also, tune max_standby_streaming_delay to limit how long replay waits for queries to finish. If I/O is the bottleneck, monitor wait events and disk throughput using tools like iostat or pg_stat_io, and consider upgrading standby storage or increasing shared_buffers and effective_io_concurrency.

In cases where WAL generation outpaces standby capacity often due to bulk operations or index rebuilds, throttle write-heavy jobs and enable wal_compression = on to reduce WAL size. Adjust checkpoint settings like checkpoint_timeout and max_wal_size to smooth out WAL bursts. If the standby is hopelessly behind, resync using pg_basebackup or a backup tool like pgBackRest, and drop the old slot only if it's abandoned – https://www.postgresql.org/docs/current/app-pgbasebackup.html.

For ongoing observability, create views to monitor replication lag and replay delay, and integrate them into your monitoring stack.

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

Replication lag in PostgreSQL Replay Lag is extremely high (236:41:55.093431

2 additional answers

Your answer