Transaction log is also growing when running the batch deletes.
Need urgent help for batch delete
We are seeing issues on Prod server lately.
We have a Job that inserts Real time data in to staging Table.We have another job that would delete anything that is 27 Hours Old in batches. We have billions of records for Insert and Deletes.
Insert is failing and causing blocking deadlock issues for other jobs. I have posted sample SQL we have below. How can we fix this issue?
F @rowCount = 1000000 -- if the last batch deleted < 1000000 rows there is no work to do...
BEGIN
DELETE TOP (1000000) FROM data.DtPerformace
WHERE (DATEDIFF(hour, actualTimeStamp, @dtNow) > @ageHours)
SET @rowCount=@@ROWCOUNT
INSERT dbo.SSIS_Log VALUES
('Delete RealTime Data' ,'Delete SPROC Batch 2',GETDATE(),CONVERT(VARCHAR(10),@rowCount) + ' rows deleted.')
END
--Batch 3
IF @rowCount = 1000000 -- if the last batch deleted < 1000000 rows there is no work to do...
BEGIN
DELETE TOP (1000000) FROM data.DtPerformance
WHERE (DATEDIFF(hour, actualTimeStamp, @dtNow) > @ageHours)
SET @rowCount=@@ROWCOUNT
INSERT dbo.SSIS_Log VALUES
('Delete RealTime Data' ,'Delete SPROC Batch 3',GETDATE(),CONVERT(VARCHAR(10),@rowCount) + ' rows deleted.')
END
Developer technologies Transact-SQL
7 answers
Sort by: Most helpful
-
-
Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
2020-11-18T22:18:10.777+00:00 There are a couple of problems here.
First, if you want to avoid conflicts with the INSERT operation, you need to delete in batches small enough so that you don't take table locks. SQL Server escalates to table locks, when there are more than 5000 locks on the table. If there are indexes on the table, you will need to account for this. If there four non-clustered indexes on the table, you will need to stay at a batch size of 1000 rows.
Next, when you do batching, you must always define your batches over an index. The way you have written the query, any index on actualTimeStamp cannot be used anyway, but you will get a table scan. The above is a complete disaster.
IF there index on actualTimeStamp, you can do:
DELETE TOP(1000) FROM data.DtPerformance WHERE actualTimeStamp < dateadd(HOUR, -@ageHours, @dtHours)
If actualTimeStamp is indexed - don't rush to add one, but post CREATE TABLE and CREATE INDEX statements for the table, so that we can look at alternatives.
-
archies01 test 1 Reputation point
2020-11-18T22:23:13.187+00:00 CREATE TABLE [data].[DPerformance](
[DataID] [bigint] IDENTITY(1,1) NOT NULL,
[DeviceID] [int] NOT NULL,
[dataValue] [float] NOT NULL,
[HeaderID] [int] NOT NULL,
[missingData] [tinyint] NULL,
[PlannedOutage] [tinyint] NOT NULL,
[ActualTimeStamp] [smalldatetime] NOT NULL,
CONSTRAINT [PK_DataPerf] PRIMARY KEY CLUSTERED
(
[DataID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 70) ON [Data]
) ON [Data]GO
-
Guoxiong 8,206 Reputation points
2020-11-18T22:36:38.297+00:00 DECLARE @minDataID bigint; DECLARE @maxDataID bigint; DECLARE @processMaxDataID bigint; SELECT @minDataID = MIIN([DataID]), @maxDataID = MAX([DataID]) FROM [data].[DPerformance] WHERE (DATEDIFF(hour, actualTimeStamp, @dtNow) > @ageHours); SET @processMaxDataID = @minDataID + 1000000; WHILE (@processMaxDataID <= @maxDataID) BEGIN DELETE [data].[DPerformance] WHERE [DataID] <= @ProcessMaxDataID; SET @rowCount = @@ROWCOUNT; INSERT [dbo].[SSIS_Log] VALUES ('Delete RealTime Data' ,'Delete SPROC Batch 2',GETDATE(),CONVERT(VARCHAR(10),@rowCount) + ' rows deleted.'); SET @processMaxDataID = @processMaxDataID + 1000000; END
-
Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
2020-11-18T22:44:32.093+00:00 Good! I will assume that ActualDataTime stamp is perfectly corrected with DataID. That is, if there are two rows ID1 and ID2, and ID2 > ID1, ActualTimeStamp for ID2 must be >= ActualTimeStamp for ID1. Then you can do:
DECLARE @cur CURSOR, @minID bigint, @maxID bigint, @ageHours int = 27, @dtNow datetime2(3) = sysdatetime() SET @cur = CURSOR STATIC FOR SELECT MIN(DataID), MAX(DataID) FROM (SELECT DataID, row_number() OVER (ORDER BY DataID) / 1000 AS batchno FROM data.DPerformance --WITH (NOLOCK) WHERE ActualTimeStamp < dateadd(HOUR, -@ageHours, @dtNow)) AS X GROUP BY batchno OPEN @cur WHILE 1 = 1 BEGIN FETCH @cur INTO @minID, @maxID IF @@fetch_status <> 0 BREAK DELETE DPerformance WHERE DataID BETWEEN @minID AND @maxID OPTION (RECOMPILE) END
Note that I have added a NOLOCK hint in comments. If the database is READ_COMMIT_SNAPSHOT, you don't need it. But if it is not, I think NOLOCK is a good idea. I generally recommend against using NOLOCK in application code since this can lead not only to dirty reads, but you can also fail to read committed rows. But since all that would happen in this case is that some rows remain a little longer, I don't see this as a problem. On the other hand, since that query requires a full table scan, it will block inserts for minutes.
A refinement would be to save the ID pairs into a permanent table. If you restart the purge, you would work from that table, and only fill it up again when you have exhausted it.