Size of the Transaction Log Increasing and cannot be truncated or Shrinked due to Snapshot Replication.
There can be many reasons for the size of Transaction Log growing large for a sql server database such as waiting on checkpoint, backups in Full recovery model, active long running transaction.
The simplest and the smartest way to determine the cause of the Transaction Log in sql server 2005 is with the use of columns log_reuse_wait, log_reuse_wait_desc in sys.databases view in Sql Server 2005 and 2008.
So we fire the following query to find the cause of the Transaction Log growth and observe the following
select log_reuse_wait_desc,name from sys.databases
log_reuse_wait_desc name
------------------------------------------------------------ -------------
REPLICATION test
And to our surprise the log_reuse_wait_desc column reflects REPLICATION for a particular database which has snapshot replication configured on it.
So we conclude the Log growth is caused due to Snapshot Replication which is kind of ironical since Snapshot Replication does not use Transaction logs to replicate data to the remote site.
So to confirm this we fire the DBCC OPENTRAN which gives the undistributed LSN for the Transaction log and we get an undistributed LSN value as shown below
DBCC OPENTRAN(13)
Transaction information for database 'test’.
Replicated Transaction Information:
Oldest distributed LSN : (0:0:0)
Oldest non-distributed LSN : (43831:51:1)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
So this confirms that above undistributed transaction is causing the Log Growth. But in Snapshot Replication the changes are replicated using Snapshots and not using Transaction Logs.
We have a Known issue with Snapshot Replication in Sql Server 2005 which causes DDL changes on the tables which have SCHEMA CHANGES marked for replication where in the DDL statements are marked for Replication in Transaction log of the database but they are not unmarked when the changes are actually propagated.
In order to confirm whether we are hitting the issue mentioned above we need to check whether that undistributed LSN ((43831:51:1) in above case)
provided in DBCC OPENTRAN output is pointing to DDL operations of type ALTER TABLE.
The issue is yet to be fixed and hence the workaround to avoid the issue is to either remove the REPLICATION OF SCHEMA CHANGES for the tables involved in the replication
(Reference: https://msdn.microsoft.com/en-us/library/ms147331.aspx)
Or
Use the following commands after running Snapshot Agent and applying the Snapshot on the subscriber.
sp_repldone null, null, 0,0,1
Before running the above command we need to ensure that SCHEMA changes are actually being replicated in order to avoid any inconsistency
Note:
This post should not be treated as the Microsoft’s Recommendation or Resolution to the problem, it is only a workaround which worked in our environment and so we would like to share it.
Parikshit Savjani
SE, Microsoft Sql Server
Reviewed By
Amit Banerjee
Technical Lead, Microsoft Sql Server