Need urgent help for batch delete

Question

Need urgent help for batch delete

archies01 test 1

We are seeing issues on Prod server lately.

We have a Job that inserts Real time data in to staging Table.We have another job that would delete anything that is 27 Hours Old in batches. We have billions of records for Insert and Deletes.

Insert is failing and causing blocking deadlock issues for other jobs. I have posted sample SQL we have below. How can we fix this issue?

F @rowCount = 1000000 -- if the last batch deleted < 1000000 rows there is no work to do...
BEGIN
DELETE TOP (1000000) FROM data.DtPerformace
WHERE (DATEDIFF(hour, actualTimeStamp, @dtNow) > @ageHours)

SET @rowCount=@@ROWCOUNT

INSERT dbo.SSIS_Log VALUES
('Delete RealTime Data' ,'Delete SPROC Batch 2',GETDATE(),CONVERT(VARCHAR(10),@rowCount) + ' rows deleted.')
END

--Batch 3
IF @rowCount = 1000000 -- if the last batch deleted < 1000000 rows there is no work to do...
BEGIN
DELETE TOP (1000000) FROM data.DtPerformance
WHERE (DATEDIFF(hour, actualTimeStamp, @dtNow) > @ageHours)

SET @rowCount=@@ROWCOUNT

INSERT dbo.SSIS_Log VALUES
('Delete RealTime Data' ,'Delete SPROC Batch 3',GETDATE(),CONVERT(VARCHAR(10),@rowCount) + ' rows deleted.')
END

Guoxiong 8,206 Reputation points

2020-11-18T21:08:38.31+00:00

What is the primary key column? Is it number?
MelissaMa-MSFT 24,221 Reputation points

2020-11-24T01:59:06.947+00:00

Hi @archies01 test ,

Could you please validate the latest answers and provide any update?

Best regards
Melissa

7 answers

Your answer

Guoxiong 8,206 Reputation points

2020-11-18T21:08:38.31+00:00

What is the primary key column? Is it number?
MelissaMa-MSFT 24,221 Reputation points

2020-11-24T01:59:06.947+00:00

Hi @archies01 test ,

Could you please validate the latest answers and provide any update?

Best regards
Melissa

Answer 1

archies01 test 1

Transaction log is also growing when running the batch deletes.

Answer 2

There are a couple of problems here.

First, if you want to avoid conflicts with the INSERT operation, you need to delete in batches small enough so that you don't take table locks. SQL Server escalates to table locks, when there are more than 5000 locks on the table. If there are indexes on the table, you will need to account for this. If there four non-clustered indexes on the table, you will need to stay at a batch size of 1000 rows.

Next, when you do batching, you must always define your batches over an index. The way you have written the query, any index on actualTimeStamp cannot be used anyway, but you will get a table scan. The above is a complete disaster.

IF there index on actualTimeStamp, you can do:

DELETE TOP(1000) FROM data.DtPerformance WHERE actualTimeStamp < dateadd(HOUR, -@ageHours, @dtHours)

If actualTimeStamp is indexed - don't rush to add one, but post CREATE TABLE and CREATE INDEX statements for the table, so that we can look at alternatives.

Answer 3

CREATE TABLE [data].[DPerformance](
[DataID] [bigint] IDENTITY(1,1) NOT NULL,
[DeviceID] [int] NOT NULL,
[dataValue] [float] NOT NULL,
[HeaderID] [int] NOT NULL,
[missingData] [tinyint] NULL,
[PlannedOutage] [tinyint] NOT NULL,
[ActualTimeStamp] [smalldatetime] NOT NULL,
CONSTRAINT [PK_DataPerf] PRIMARY KEY CLUSTERED
(
[DataID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 70) ON [Data]
) ON [Data]

GO

Answer 4

DECLARE @minDataID bigint;
DECLARE @maxDataID bigint;
DECLARE @processMaxDataID bigint;
SELECT @minDataID = MIIN([DataID]), @maxDataID = MAX([DataID])
FROM [data].[DPerformance]
WHERE (DATEDIFF(hour, actualTimeStamp, @dtNow) > @ageHours);

SET @processMaxDataID = @minDataID + 1000000;
WHILE (@processMaxDataID <= @maxDataID)
BEGIN
    DELETE [data].[DPerformance] WHERE [DataID] <= @ProcessMaxDataID; 
    SET @rowCount = @@ROWCOUNT;
    INSERT [dbo].[SSIS_Log] VALUES 
    ('Delete RealTime Data' ,'Delete SPROC Batch 2',GETDATE(),CONVERT(VARCHAR(10),@rowCount) + ' rows deleted.');

    SET @processMaxDataID = @processMaxDataID + 1000000;
END

Answer 5

Good! I will assume that ActualDataTime stamp is perfectly corrected with DataID. That is, if there are two rows ID1 and ID2, and ID2 > ID1, ActualTimeStamp for ID2 must be >= ActualTimeStamp for ID1. Then you can do:

DECLARE @cur   CURSOR,
        @minID bigint,
        @maxID bigint,
        @ageHours int = 27,
        @dtNow  datetime2(3) = sysdatetime()

SET @cur = CURSOR STATIC FOR
    SELECT MIN(DataID), MAX(DataID)
    FROM   (SELECT DataID, row_number() OVER (ORDER BY DataID) / 1000 AS batchno
            FROM   data.DPerformance --WITH (NOLOCK)
            WHERE  ActualTimeStamp < dateadd(HOUR, -@ageHours, @dtNow)) AS X
    GROUP  BY batchno

OPEN @cur

WHILE 1 = 1
BEGIN 
   FETCH @cur INTO @minID, @maxID
   IF @@fetch_status <> 0
      BREAK

   DELETE DPerformance WHERE DataID BETWEEN @minID AND @maxID
  OPTION (RECOMPILE)
END

Note that I have added a NOLOCK hint in comments. If the database is READ_COMMIT_SNAPSHOT, you don't need it. But if it is not, I think NOLOCK is a good idea. I generally recommend against using NOLOCK in application code since this can lead not only to dirty reads, but you can also fail to read committed rows. But since all that would happen in this case is that some rows remain a little longer, I don't see this as a problem. On the other hand, since that query requires a full table scan, it will block inserts for minutes.

A refinement would be to save the ID pairs into a permanent table. If you restart the purge, you would work from that table, and only fill it up again when you have exhausted it.

archies01 test 1 Reputation point

2020-11-18T22:52:28.137+00:00

will go through this . Thank You.
Insert Data Job runs at different times compared to Delete Data Job from this Table.

But It inserts Billions of records and these needs to be deleted next day.
archies01 test 1 Reputation point

2020-11-18T22:54:06.9+00:00

One more question.

Delete job has been failing from past couple of days and this table has 200 Million records at the moment. This caused space issues. I have started a script that would Shrink files in chunks.

I am trying to truncate this table but the truncate is also taking long time. Any idea how can I truncate this Table before I implement your script?

Share via

Need urgent help for batch delete

7 answers

Your answer