Manage retention of historical data in system-versioned temporal tables
Applies to: SQL Server 2016 (13.x) and later Azure SQL Database Azure SQL Managed Instance
With system-versioned temporal tables, the history table might increase database size more than regular tables, particularly under the following conditions:
- You retain historical data for a long period of time
- You have an update or delete heavy data modification pattern
A large and ever-growing history table can become an issue both due to pure storage costs, and imposing a performance tax on temporal querying. Hence, developing a data retention policy for managing data in the history table is an important aspect of planning and managing the lifecycle of every temporal table.
Data retention management for history table
Managing temporal table data retention begins with determining the required retention period for each temporal table. Your retention policy, in most cases, should be considered to be part of the business logic of the application using the temporal tables. For example, applications in data audit and time travel scenarios have firm requirements in terms of for how long historical data must be available for online querying.
Once you determine your data retention period, your next step is to develop a plan for managing historical data. You must decide how and where you store your historical data, and how to delete historical data that is older than your retention requirements. The following approaches for managing historical data in the temporal history table are available:
With each of these approaches, the logic for migrating or cleaning history data is based on the column that corresponds to end of period in the current table. The end of period value for each row determines the moment when the row version becomes closed, that is, when it lands in the history table. For example, the condition ValidTo < DATEADD (DAYS, -30, SYSUTCDATETIME ())
specifies that historical data older than one month needs to be removed or moved out from the history table.
Note
The examples in this article use this Create a system-versioned temporal table.
Use table partitioning approach
Partitioned tables and indexes can make large tables more manageable and scalable. Using the table partitioning approach, you can use history table partitions to implement custom data cleanup or offline archival based on a time condition. Table partitioning also gives you performance benefits when querying temporal tables on a subset of data history by using partition elimination.
With table partitioning, you can implement a sliding window to move out oldest portion of the historical data from the history table and keep the size of the retained part constant in terms of age - maintaining data in the history table equal to required retention period. The operation of switching data out from the history table is supported while SYSTEM_VERSIONING
is ON
, which means that you can clean a portion of the history data without introducing a maintenance window or blocking your regular workloads.
Note
In order to perform partition switching, your clustered index on history table must be aligned with the partitioning schema (it has to contain ValidTo
). The default history table created by the system contains a clustered index that includes the ValidTo
and ValidFrom
columns, which is optimal for partitioning, inserting new history data, and typical temporal querying. For more information, see Temporal tables.
A sliding window has two sets of tasks that you need to perform:
- A partitioning configuration task
- Recurring partition maintenance tasks
For the illustration, let's assume that you want to keep historical data for six months and that you want to keep every month of data in a separate partition. Also, let's assume that you activated system-versioning in September of 2023.
A partitioning configuration task creates the initial partitioning configuration for the history table. For this example, you would create the same number partitions as the size of sliding window, in months, plus an extra empty partition preprepared (explained later in this article). This configuration ensures that the system is able to store new data correctly when you start the recurring partition maintenance task for the first time, and guarantees that you never split partitions with data to avoid expensive data movements. You should perform this task using Transact-SQL using the example script later in this article.
The following picture shows initial partitioning configuration to keep six months of data.
Note
See performance considerations with table partitioning later in this article for the performance implications of using RANGE LEFT
versus RANGE RIGHT
when configuring partitioning.
The first and last partitions are open on lower and upper boundaries, respectively, to ensure that every new row has destination partition regardless of the value in partitioning column. As time goes by, new rows in history table land in higher partitions. When the sixth partition gets filled up, you reach the targeted retention period. This is the moment to start the recurring partition maintenance task for the first time (it needs to be scheduled to run periodically, once per month in this example).
The following picture illustrates the recurring partition maintenance tasks (see detailed steps later in this section).
The detailed steps for the recurring partition maintenance tasks are:
SWITCH OUT: Create a staging table and then switch a partition between the history table and the staging table using the ALTER TABLE (Transact-SQL) statement with the
SWITCH PARTITION
argument (see Example C. Switching partitions between tables).ALTER TABLE [<history table>] SWITCH PARTITION 1 TO [<staging table>];
After the partition switch, you can optionally archive the data from staging table and then either drop or truncate the staging table, to be ready for the next time you need to perform this recurring partition maintenance task.
MERGE RANGE
: Merge the empty partition1
with partition2
using the ALTER PARTITION FUNCTION (Transact-SQL) withMERGE RANGE
(See example B). By removing the lowest boundary using this function, you effectively merge the empty partition1
with the former partition2
to form new partition1
. The other partitions also effectively change their ordinals.SPLIT RANGE
: Create a new empty partition 7 using the ALTER PARTITION FUNCTION (Transact-SQL) withSPLIT RANGE
(See example A). By adding a new upper boundary using this function, you effectively create a separate partition for the upcoming month.
Use Transact-SQL to create partitions on history table
Use the following Transact-SQL script to create the partition function, the partition schema, and recreate the clustered index to be partition-aligned with the partition schema, partitions. For this example, you create a six-month sliding window with monthly partitions, beginning September 2023.
BEGIN TRANSACTION
/*Create partition function*/
CREATE PARTITION FUNCTION [fn_Partition_DepartmentHistory_By_ValidTo] (DATETIME2(7))
AS RANGE LEFT FOR VALUES (
N'2023-09-30T23:59:59.999',
N'2023-10-31T23:59:59.999',
N'2023-11-30T23:59:59.999',
N'2023-12-31T23:59:59.999',
N'2024-01-31T23:59:59.999',
N'2024-02-29T23:59:59.999'
);
/*Create partition scheme*/
CREATE PARTITION SCHEME [sch_Partition_DepartmentHistory_By_ValidTo]
AS PARTITION [fn_Partition_DepartmentHistory_By_ValidTo] TO (
[PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY],
[PRIMARY], [PRIMARY], [PRIMARY]
);
/*Re-create index to be partition-aligned with the partitioning schema*/
CREATE CLUSTERED INDEX [ix_DepartmentHistory] ON [dbo].[DepartmentHistory] (
ValidTo ASC,
ValidFrom ASC
)
WITH (
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = ON,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
DATA_COMPRESSION = PAGE
) ON [sch_Partition_DepartmentHistory_By_ValidTo](ValidTo);
COMMIT TRANSACTION;
Use Transact-SQL to maintain partitions in sliding window scenario
Use the following Transact-SQL script to maintain partitions in the sliding window scenario. For this example, you switch out the partition for September of 2023 using MERGE RANGE
, and then add a new partition for March of 2024 using SPLIT RANGE
.
BEGIN TRANSACTION
/*(1) Create staging table */
CREATE TABLE [dbo].[staging_DepartmentHistory_September_2023] (
DeptID INT NOT NULL,
DeptName VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
ManagerID INT NULL,
ParentDeptID INT NULL,
ValidFrom DATETIME2(7) NOT NULL,
ValidTo DATETIME2(7) NOT NULL
) ON [PRIMARY]
WITH (DATA_COMPRESSION = PAGE);
/*(2) Create index on the same filegroups as the partition that will be switched out*/
CREATE CLUSTERED INDEX [ix_staging_DepartmentHistory_September_2023]
ON [dbo].[staging_DepartmentHistory_September_2023] (
ValidTo ASC,
ValidFrom ASC
)
WITH (
PAD_INDEX = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY];
/*(3) Create constraints matching the partition that will be switched out*/
ALTER TABLE [dbo].[staging_DepartmentHistory_September_2023]
WITH CHECK ADD CONSTRAINT [chk_staging_DepartmentHistory_September_2023_partition_1]
CHECK (ValidTo <= N'2023-09-30T23:59:59.999')
ALTER TABLE [dbo].[staging_DepartmentHistory_September_2023]
CHECK CONSTRAINT [chk_staging_DepartmentHistory_September_2023_partition_1]
/*(4) Switch partition to staging table*/
ALTER TABLE [dbo].[DepartmentHistory] SWITCH PARTITION 1
TO [dbo].[staging_DepartmentHistory_September_2023]
WITH (WAIT_AT_LOW_PRIORITY(MAX_DURATION = 0 MINUTES, ABORT_AFTER_WAIT = NONE))
/*(5) [Commented out] Optionally archive the data and drop staging table
INSERT INTO [ArchiveDB].[dbo].[DepartmentHistory]
SELECT * FROM [dbo].[staging_DepartmentHistory_September_2023];
DROP TABLE [dbo].[staging_DepartmentHIstory_September_2023];
*/
/*(6) merge range to move lower boundary one month ahead*/
ALTER PARTITION FUNCTION [fn_Partition_DepartmentHistory_By_ValidTo]()
MERGE RANGE(N'2023-09-30T23:59:59.999');
/*(7) Create new empty partition for "April and after" by creating new boundary point and specifying NEXT USED file group*/
ALTER PARTITION SCHEME [sch_Partition_DepartmentHistory_By_ValidTo] NEXT USED [PRIMARY]
ALTER PARTITION FUNCTION [fn_Partition_DepartmentHistory_By_ValidTo]()
SPLIT RANGE(N'2024-03-31T23:59:59.999');
COMMIT TRANSACTION
You can slightly modify the previous script and use it in regular monthly maintenance process:
- In step (1) create new staging table for the month you want to remove (October would be next one in this example).
- In step (3) create and check constraint that matches the month of data you want to remove:
ValidTo <= N'2023-10-31T23:59:59.999'
for October partition. - In step (4)
SWITCH
partition1
to the newly created staging table. - In step (6) alter the partition function by merging lower boundary:
MERGE RANGE(N'2023-10-31T23:59:59.999'
after you move out data for October. - In step (7) split the partition function creating new upper boundary:
SPLIT RANGE (N'2024-04-30T23:59:59.999'
after you move out data for October.
However, the optimal solution would be to regularly run a generic Transact-SQL script that is a capable of performing the appropriate action every month without script modification. It's possible to generalize the previous script to act upon provided parameters (lower boundary that needs to be merged and new boundary that will be created by with partition split). In order to avoid staging table creation every month, you can create one beforehand and reuse by changing check constraint to match partition that will be switched out. Take a look at the following pages to get ideas on how sliding window can be fully automated using a Transact-SQL script.
Performance considerations with table partitioning
It's important to perform the MERGE
and SPLIT RANGE
operations to avoid any data movement as data movement can incur significant performance overhead. For more information, see Modify a partition function. You do so by using RANGE LEFT
rather than RANGE RIGHT
when you create the partition function.
Let's first visually explain meaning of the RANGE LEFT
and RANGE RIGHT
options:
When you define a partition function as RANGE LEFT
, the specified values are the upper boundaries of the partitions. When you use RANGE RIGHT
, the specified values are the lower boundaries of the partitions. When you use the MERGE RANGE
operation to remove a boundary from the partition function definition, the underlying implementation also removes the partition that contains the boundary. If that partition isn't empty, data is moved to the partition that is result of MERGE RANGE
operation.
In a sliding window scenario, you always remove the lowest partition boundary.
RANGE LEFT
case: The lowest partition boundary belongs to partition1
, which is empty (after partition switch out), soMERGE RANGE
doesn't incur any data movement.RANGE RIGHT
case: The lowest partition boundary belongs to partition2
, which isn't empty, because partition1
was emptied by switching out. In this case,MERGE RANGE
incurs data movement (data from partition2
is moved to partition1
). To avoid this,RANGE RIGHT
in the sliding window scenario needs to have partition1
, which is always empty. This means that if you useRANGE RIGHT
, you should create and maintain one extra partition compared toRANGE LEFT
case.
Conclusion: Using RANGE LEFT
in sliding partition is easier for partition management, and avoids data movement. However, defining partition boundaries with RANGE RIGHT
is slightly simpler as you don't have to deal with datetime time check issues.
Use custom cleanup script approach
In cases when table partitioning isn't viable, another approach is to delete the data from history table using a custom cleanup script. Deleting data from history table is possible only when SYSTEM_VERSIONING = OFF
. In order to avoid data inconsistency, perform cleanup either during the maintenance window (when workloads that modify data aren't active) or within a transaction (effectively blocking other workloads). This operation requires CONTROL
permission on current and history tables.
To minimally block regular applications and user queries, delete data in smaller chunks with a delay when performing the cleanup script inside a transaction. While there's no optimal size for each data chunk to be deleted for all scenarios, deleting more than 10,000 rows in a single transaction might impose a significant effect.
The cleanup logic is the same for every temporal table, so it can be automated through a generic stored procedure that you schedule to run periodically, for every temporal table for which you want to limit data history.
The following diagram illustrates how your cleanup logic should be organized for a single table to reduce impact on the running workloads.
Here are some high-level guidelines for implementing the process. Schedule cleanup logic to run every day and iterate over all temporal tables that need data cleanup. Use SQL Server Agent or different tool to schedule this process:
- Delete historical data in every temporal table, starting from the oldest to the most recent rows in several iterations in small chunks, and avoid deleting all rows in a single transaction, as shown in the previous diagram.
- Implement every iteration as an invocation of generic stored procedure that removes a portion of data from the history table (see the following code example for this procedure).
- Calculate how many rows you need to delete for an individual temporal table every time you invoke the process. Based on that and the number of iterations you want to have, determine dynamic split points for every procedure invocation.
- Plan to have a period of delay between iterations for a single table, to reduce the effect on applications that access the temporal table.
A stored procedure that deletes the data for a single temporal table might look like in the following code snippet (review this code carefully and adjust it before apply in your environment):
DROP PROCEDURE IF EXISTS usp_CleanupHistoryData;
GO
CREATE PROCEDURE usp_CleanupHistoryData @temporalTableSchema SYSNAME,
@temporalTableName SYSNAME,
@cleanupOlderThanDate DATETIME2
AS
DECLARE @disableVersioningScript NVARCHAR(MAX) = '';
DECLARE @deleteHistoryDataScript NVARCHAR(MAX) = '';
DECLARE @enableVersioningScript NVARCHAR(MAX) = '';
DECLARE @historyTableName SYSNAME
DECLARE @historyTableSchema SYSNAME
DECLARE @periodColumnName SYSNAME
/*Generate script to discover history table name and end of period column for given temporal table name*/
EXECUTE sp_executesql
N'SELECT @hst_tbl_nm = t2.name,
@hst_sch_nm = s2.name,
@period_col_nm = c.name
FROM sys.tables t1
INNER JOIN sys.tables t2 ON t1.history_table_id = t2.object_id
INNER JOIN sys.schemas s1 ON t1.schema_id = s1.schema_id
INNER JOIN sys.schemas s2 ON t2.schema_id = s2.schema_id
INNER JOIN sys.periods p ON p.object_id = t1.object_id
INNER JOIN sys.columns c ON p.end_column_id = c.column_id AND c.object_id = t1.object_id
WHERE t1.name = @tblName AND s1.name = @schName',
N'@tblName sysname,
@schName sysname,
@hst_tbl_nm sysname OUTPUT,
@hst_sch_nm sysname OUTPUT,
@period_col_nm sysname OUTPUT',
@tblName = @temporalTableName,
@schName = @temporalTableSchema,
@hst_tbl_nm = @historyTableName OUTPUT,
@hst_sch_nm = @historyTableSchema OUTPUT,
@period_col_nm = @periodColumnName OUTPUT
IF @historyTableName IS NULL OR @historyTableSchema IS NULL OR @periodColumnName IS NULL
THROW 50010, 'History table cannot be found. Either specified table is not system-versioned temporal or you have provided incorrect argument values.', 1;
/*Generate 3 statements that run inside a transaction:
(1) SET SYSTEM_VERSIONING = OFF,
(2) DELETE FROM history_table,
(3) SET SYSTEM_VERSIONING = ON
On SQL Server 2016, it is critical that (1) and (2) run in separate EXEC statements, or SQL Server generates the following error:
Msg 13560, Level 16, State 1, Line XXX
Cannot delete rows from a temporal history table '<database_name>.<history_table_schema_name>.<history_table_name>'.
*/
SET @disableVersioningScript = @disableVersioningScript
+ 'ALTER TABLE [' + @temporalTableSchema + '].[' + @temporalTableName
+ '] SET (SYSTEM_VERSIONING = OFF)'
SET @deleteHistoryDataScript = @deleteHistoryDataScript + ' DELETE FROM ['
+ @historyTableSchema + '].[' + @historyTableName + '] WHERE ['
+ @periodColumnName + '] < ' + '''' + CONVERT(VARCHAR(128), @cleanupOlderThanDate, 126) + ''''
SET @enableVersioningScript = @enableVersioningScript + ' ALTER TABLE ['
+ @temporalTableSchema + '].[' + @temporalTableName
+ '] SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = [' + @historyTableSchema
+ '].[' + @historyTableName + '], DATA_CONSISTENCY_CHECK = OFF )); '
BEGIN TRANSACTION
EXEC (@disableVersioningScript);
EXEC (@deleteHistoryDataScript);
EXEC (@enableVersioningScript);
COMMIT;
Use temporal history retention policy approach
Applies to: SQL Server 2017 (14.x) and later versions, and Azure SQL Database.
Temporal history retention can be configured at the individual table level, which allows users to create flexible aging policies. Applying temporal retention is simple: it requires only one parameter to be set during table creation or schema change.
After you define the retention policy, the Database Engine starts checking regularly if there are historical rows that are eligible for automatic data cleanup. Identification of matching rows and their removal from the history table occur transparently, in the background task that is scheduled and run by the system. Age condition for the history table rows is checked based on the column representing end of the SYSTEM_TIME
period. If retention period, for example, is set to six months, table rows eligible for cleanup satisfy the following condition:
ValidTo < DATEADD (MONTH, -6, SYSUTCDATETIME())
In the preceding example, the ValidTo
column corresponds to the end of the SYSTEM_TIME
period.
How to configure retention policy
Before you configure retention policy for a temporal table, check first whether temporal historical retention is enabled at the database level:
SELECT is_temporal_history_retention_enabled, name
FROM sys.databases;
The database flag is_temporal_history_retention_enabled
is set to ON
by default, but users can change it with the ALTER DATABASE
statement. This value is automatically set to OFF
after a point-in-time restore (PITR) operation. To enable temporal history retention cleanup for your database, run the following statement:
ALTER DATABASE [<myDB>]
SET TEMPORAL_HISTORY_RETENTION ON;
Retention policy is configured during table creation by specifying value for the HISTORY_RETENTION_PERIOD
parameter:
CREATE TABLE dbo.WebsiteUserInfo
(
UserID INT NOT NULL PRIMARY KEY CLUSTERED,
UserName NVARCHAR(100) NOT NULL,
PagesVisited int NOT NULL,
ValidFrom DATETIME2(0) GENERATED ALWAYS AS ROW START,
ValidTo DATETIME2(0) GENERATED ALWAYS AS ROW END,
PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
)
WITH (SYSTEM_VERSIONING = ON
(
HISTORY_TABLE = dbo.WebsiteUserInfoHistory,
HISTORY_RETENTION_PERIOD = 6 MONTHS
)
);
You can specify retention period by using different time units: DAYS
, WEEKS
, MONTHS
, and YEARS
. If HISTORY_RETENTION_PERIOD
is omitted, INFINITE
retention is assumed. You can also use INFINITE
keyword explicitly.
In some scenarios, you might want to configure retention after table creation, or to change previously configured value. In that case, use the ALTER TABLE
statement:
ALTER TABLE dbo.WebsiteUserInfo
SET (SYSTEM_VERSIONING = ON (HISTORY_RETENTION_PERIOD = 9 MONTHS));
To review current state of the retention policy, use the following query that joins temporal retention enablement flag at the database level with retention periods for individual tables:
SELECT DB.is_temporal_history_retention_enabled,
SCHEMA_NAME(T1.schema_id) AS TemporalTableSchema,
T1.name AS TemporalTableName,
SCHEMA_NAME(T2.schema_id) AS HistoryTableSchema,
T2.name AS HistoryTableName,
T1.history_retention_period,
T1.history_retention_period_unit_desc
FROM sys.tables T1
OUTER APPLY (
SELECT is_temporal_history_retention_enabled
FROM sys.databases
WHERE name = DB_NAME()
) AS DB
LEFT JOIN sys.tables T2
ON T1.history_table_id = T2.object_id
WHERE T1.temporal_type = 2;
How SQL Database deletes aged rows
The cleanup process depends on the index layout of the history table. Only history tables with a clustered index (B+ tree or columnstore) can have finite retention policy configured. A background task is created to perform aged data cleanup for all temporal tables with finite retention period. Cleanup logic for the rowstore (B+ tree) clustered index deletes aged rows in smaller chunks (up to 10K) minimizing pressure on database log and I/O subsystem. Although cleanup logic utilizes required B+ tree index, order of deletions for the rows older than retention period can't be firmly guaranteed. Hence, don't take any dependency on the cleanup order in your applications.
The cleanup task for the clustered columnstore removes entire row groups at once (typically contain 1 million of rows each), which is very efficient, especially when historical data is generated at a high pace.
Excellent data compression and efficient retention cleanup makes clustered columnstore index a perfect choice for scenarios when your workload rapidly generates high amount of historical data. That pattern is typical for intensive transactional processing workloads that use temporal tables for change tracking and auditing, trend analysis, or IoT data ingestion.
For more information, see Manage historical data in Temporal Tables with retention policy.
Related content
- Temporal tables
- Getting Started with system-versioned temporal tables
- Temporal table system consistency checks
- Partitioning with temporal tables
- Temporal table considerations and limitations
- Temporal Table Security
- System-Versioned Temporal Tables with Memory-Optimized Tables
- Temporal Table Metadata Views and Functions
คำติชม
https://aka.ms/ContentUserFeedback
เร็วๆ นี้: ตลอดปี 2024 เราจะขจัดปัญหา GitHub เพื่อเป็นกลไกคำติชมสำหรับเนื้อหา และแทนที่ด้วยระบบคำติชมใหม่ สำหรับข้อมูลเพิ่มเติม ให้ดู:ส่งและดูข้อคิดเห็นสำหรับ