snapshot replication - corrupted BCP files

Rives, Vincent 1 Reputation point
2022-08-18T19:45:14.02+00:00

Each night my SQLServer replication failed on a big table with the following error. It is failing on a different table each night but always on a relatively big table ( > 200,000 rows)

The process could not bulk copy into table '"dbo"."Table01"'. (Source: MSSQL_REPL, Error number: MSSQL_REPL20037)
Get help: http://help/MSSQL_REPL20037
Field size too large
Batch send failed
Violation of PRIMARY KEY constraint 'PK_Table01'. Cannot insert duplicate key in object 'dbo.Table01'. The duplicate key value is (0). (Source: MSSQLServer, Error number: 2627)
Get help: http://help/2627
Failed to send batch after max errors (Source: MSSQLServer, Error number: 3621)
Get help: http://help/3621

Publication is configured on a SQL Server 2017 CU29 instance on a FCI Cluster which is hosted on one node (the errors were also occurring with CU27)
"Push" Subscription is configured on a SQL Server 2017 CU29 instance on the same node of the cluster
Distributor is configured on the same instance as the subscription

Each morning I trigger the snapshot and all get fixed automatically.
I made the following tests:

  • move the schedule of my snapshot from 0:30AM to 2:25AM -> same problem
  • disable antivirus scan -> same problem
  • disable filesystem backup on the drive used by the datafiles -> same problem
  • one night the replication failed for one database which had 2 subscribers. the 2 subscribers were in failure using the same snapshot -> problem is not on the subscriber side
  • insert the BCP files manually into the destination/subscriber database -> did not work

sometimes the error message varies :

The process could not bulk copy into table '"dbo"."Table02"'. (Source: MSSQL_REPL, Error number: MSSQL_REPL20037)
Get help: http://help/MSSQL_REPL20037
Batch send failed
Violation of PRIMARY KEY constraint 'PK_Table02'. Cannot insert duplicate key in object 'dbo.Table02'. The duplicate key value is (0). (Source: MSSQLServer, Error number: 2627)
Get help: http://help/2627
To obtain an error file with details on the errors encountered when initializing the subscribing table, execute the bcp command that appears below. Consult the BOL for more information on the bcp utility and its supported options. (Source: MSSQLServer, Error number: 20253)
Get help: http://help/20253
bcp "DB02"."dbo"."Table02" in "\SERV02\SQL_Repldata\unc\SQL02_DB02_PUB_DB02_SN\20220727003003\Table0227f2adc_608#36.bcp" -e "errorfile" -t"\n<x$3>\n" -r"\n<,@g>\n" -m10000 -SLIRIO\ISTORE -T -w (Source: MSSQLServer, Error number: 20253)
Get help: http://help/20253

I am pretty convinced that something occurs during the creation of the BCP files.
Can you help me?
Do you know a way to get more information during the creation of the BCP files or monitor the snapshot folder?

Thanks,
Vincent

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,653 questions
{count} votes

5 answers

Sort by: Most helpful
  1. Seeya Xi-MSFT 16,436 Reputation points
    2022-08-19T05:46:09.77+00:00

    Hi @Rives, Vincent ,

    Welcome to Microsoft Q&A!

    I am pretty convinced that something occurs during the creation of the BCP files.

    I agree with you. Please refer to the below link:
    https://digitalenginesoftware.com/blog/archives/70-Corrupted-MSmerge_contents90_forall.bcp-in-Dynamic-Snapshot

    Best regards,
    Seeya


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  2. Rives, Vincent 1 Reputation point
    2022-08-22T13:21:10.38+00:00

    Thank you for your answer.

    The web link is very interesting because it is very close to my situation except for the fact that if I re-run the snapshot agent later, the replication works eventually.

    Over the WE I have dropped the subscription and the dynamic folder (created in 2021) in the \serveur\snapshot_folder\unc. This Monday morning, the replication worked fine.

    I am going to wait for a few days. If it does not break anymore I will click on "Accept Answer".


  3. Rives, Vincent 1 Reputation point
    2022-08-28T21:01:55.307+00:00

    Hi

    After dropping the subscription and the snapshot folders \SERV02\SQL_Repldata\unc\SQL02_DB02_PUB_DB02_SN, the error still occurs.

    My problem only occurs on big tables which generates lots of BCP Files. I wonder if the problem could come from the multithreading that BCP is using for the parallelization of the creation of BCP Files.
    Perhaps I should look at configuring properly my MAXDOP which is still set to the default value of zero on my subscriber.

    I was advised to look at configuring the READ_COMMITTED_SNAPSHOT option to TRUE but this change is a major for the transactional workload used by the application.

    Thanks,
    Vincent


  4. Rives, Vincent 1 Reputation point
    2022-09-06T12:35:56.523+00:00

    I have been doing the following tests:
    In production, I created a test database which is replicated to the same place as others. I copied 3 big tables to this test database.
    For this test database, a snapshot occurs every hour. There is no application connecting to this test database (except for the backup tool and SCOM).

    --> Over the WE, the replication broke twice on that test database. I configured the option "Allow snapshot isolation" to true on that database. It is still failing.
    --> I decreased the MAXDOP from 12 to 8 at the server instance level, it is still failing....

    What it is really striking is the replication failing on the test database where there is no activity except for the backup tool and SCOM..... wonder if there is an issue with the drive where the bcp fils are written.....

    0 comments No comments

  5. Rives, Vincent 1 Reputation point
    2022-10-24T13:38:13.76+00:00

    The problem was linked to the SMB share used by the replication to create/use the BCP Files.

    The solution was to move the AD account used by the replication from its AD group directly to shared folder permissions and filesystem permissions in the security settings of the BCP files directory.

    0 comments No comments