Need urgent help for batch delete

archies01 test 1 Reputation point
2020-11-18T20:15:53.187+00:00

We are seeing issues on Prod server lately.

We have a Job that inserts Real time data in to staging Table.We have another job that would delete anything that is 27 Hours Old in batches. We have billions of records for Insert and Deletes.

Insert is failing and causing blocking deadlock issues for other jobs. I have posted sample SQL we have below. How can we fix this issue?

F @rowCount = 1000000 -- if the last batch deleted < 1000000 rows there is no work to do...
BEGIN
DELETE TOP (1000000) FROM data.DtPerformace
WHERE (DATEDIFF(hour, actualTimeStamp, @dtNow) > @ageHours)

SET @rowCount=@@ROWCOUNT

INSERT dbo.SSIS_Log VALUES
('Delete RealTime Data' ,'Delete SPROC Batch 2',GETDATE(),CONVERT(VARCHAR(10),@rowCount) + ' rows deleted.')
END

--Batch 3
IF @rowCount = 1000000 -- if the last batch deleted < 1000000 rows there is no work to do...
BEGIN
DELETE TOP (1000000) FROM data.DtPerformance
WHERE (DATEDIFF(hour, actualTimeStamp, @dtNow) > @ageHours)

SET @rowCount=@@ROWCOUNT

INSERT dbo.SSIS_Log VALUES
('Delete RealTime Data' ,'Delete SPROC Batch 3',GETDATE(),CONVERT(VARCHAR(10),@rowCount) + ' rows deleted.')
END

Developer technologies Transact-SQL
{count} votes

7 answers

Sort by: Most helpful
  1. archies01 test 1 Reputation point
    2020-11-18T20:23:20.887+00:00

    Transaction log is also growing when running the batch deletes.

    0 comments No comments

  2. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2020-11-18T22:18:10.777+00:00

    There are a couple of problems here.

    First, if you want to avoid conflicts with the INSERT operation, you need to delete in batches small enough so that you don't take table locks. SQL Server escalates to table locks, when there are more than 5000 locks on the table. If there are indexes on the table, you will need to account for this. If there four non-clustered indexes on the table, you will need to stay at a batch size of 1000 rows.

    Next, when you do batching, you must always define your batches over an index. The way you have written the query, any index on actualTimeStamp cannot be used anyway, but you will get a table scan. The above is a complete disaster.

    IF there index on actualTimeStamp, you can do:

    DELETE TOP(1000) FROM data.DtPerformance WHERE actualTimeStamp < dateadd(HOUR, -@ageHours, @dtHours)
    

    If actualTimeStamp is indexed - don't rush to add one, but post CREATE TABLE and CREATE INDEX statements for the table, so that we can look at alternatives.

    0 comments No comments

  3. archies01 test 1 Reputation point
    2020-11-18T22:23:13.187+00:00

    CREATE TABLE [data].[DPerformance](
    [DataID] [bigint] IDENTITY(1,1) NOT NULL,
    [DeviceID] [int] NOT NULL,
    [dataValue] [float] NOT NULL,
    [HeaderID] [int] NOT NULL,
    [missingData] [tinyint] NULL,
    [PlannedOutage] [tinyint] NOT NULL,
    [ActualTimeStamp] [smalldatetime] NOT NULL,
    CONSTRAINT [PK_DataPerf] PRIMARY KEY CLUSTERED
    (
    [DataID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 70) ON [Data]
    ) ON [Data]

    GO

    0 comments No comments

  4. Guoxiong 8,206 Reputation points
    2020-11-18T22:36:38.297+00:00
    DECLARE @minDataID bigint;
    DECLARE @maxDataID bigint;
    DECLARE @processMaxDataID bigint;
    SELECT @minDataID = MIIN([DataID]), @maxDataID = MAX([DataID])
    FROM [data].[DPerformance]
    WHERE (DATEDIFF(hour, actualTimeStamp, @dtNow) > @ageHours);
    
    SET @processMaxDataID = @minDataID + 1000000;
    WHILE (@processMaxDataID <= @maxDataID)
    BEGIN
        DELETE [data].[DPerformance] WHERE [DataID] <= @ProcessMaxDataID; 
        SET @rowCount = @@ROWCOUNT;
        INSERT [dbo].[SSIS_Log] VALUES 
        ('Delete RealTime Data' ,'Delete SPROC Batch 2',GETDATE(),CONVERT(VARCHAR(10),@rowCount) + ' rows deleted.');
    
        SET @processMaxDataID = @processMaxDataID + 1000000;
    END
    
    0 comments No comments

  5. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2020-11-18T22:44:32.093+00:00

    Good! I will assume that ActualDataTime stamp is perfectly corrected with DataID. That is, if there are two rows ID1 and ID2, and ID2 > ID1, ActualTimeStamp for ID2 must be >= ActualTimeStamp for ID1. Then you can do:

    DECLARE @cur   CURSOR,
            @minID bigint,
            @maxID bigint,
            @ageHours int = 27,
            @dtNow  datetime2(3) = sysdatetime()
    
    SET @cur = CURSOR STATIC FOR
        SELECT MIN(DataID), MAX(DataID)
        FROM   (SELECT DataID, row_number() OVER (ORDER BY DataID) / 1000 AS batchno
                FROM   data.DPerformance --WITH (NOLOCK)
                WHERE  ActualTimeStamp < dateadd(HOUR, -@ageHours, @dtNow)) AS X
        GROUP  BY batchno
    
    OPEN @cur
    
    WHILE 1 = 1
    BEGIN 
       FETCH @cur INTO @minID, @maxID
       IF @@fetch_status <> 0
          BREAK
    
       DELETE DPerformance WHERE DataID BETWEEN @minID AND @maxID
      OPTION (RECOMPILE)
    END
    

    Note that I have added a NOLOCK hint in comments. If the database is READ_COMMIT_SNAPSHOT, you don't need it. But if it is not, I think NOLOCK is a good idea. I generally recommend against using NOLOCK in application code since this can lead not only to dirty reads, but you can also fail to read committed rows. But since all that would happen in this case is that some rows remain a little longer, I don't see this as a problem. On the other hand, since that query requires a full table scan, it will block inserts for minutes.

    A refinement would be to save the ID pairs into a permanent table. If you restart the purge, you would work from that table, and only fill it up again when you have exhausted it.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.