extents reported in USN_RECORD_V4 struct are a lot bigger than actual changes

Kanak Agrawal 6 Reputation points
2023-11-29T19:53:26.6466667+00:00

We use USN change journal to track changes on SQL data files for performing incremental backups.

For big files (of size 5 TB), we have seen USN_RECORD_V4 have extents reporting that a big portion of file is modified but actual change on the file is very less.

I want to understand how and when USN_RECORD_EXTENT are added to a USN_RECORD_V4.

In what scenarios is it possible that data is not overwritten but there is an extent added in USN_RECORD_V4.

On one day, I observed, 943 records for a file of size 5 TB. Of these 943 records, there are three consecutive records for this file which had extents which corresponds to changed data of around 330 GB, 220 GB and 4300 GB. All other 940 records combined reported 70 GB data is changed.

But actual change of that file is only of 600 GB (We did a diff between file backups of the two versions of file at 16KB granularity). So, the three big records corresponds to some actual changes in the file but are reporting some very big extents which are not modified.

Due to this, our backup software is doing way more work than what it should.

Windows API - Win32
Windows API - Win32
A core set of Windows application programming interfaces (APIs) for desktop and server applications. Previously known as Win32 API.
2,519 questions
C++
C++
A high-level, general-purpose programming language, created as an extension of the C programming language, that has object-oriented, generic, and functional features in addition to facilities for low-level memory manipulation.
3,634 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Husnain Ali 0 Reputation points
    2023-12-01T23:13:58.82+00:00

    The USN_RECORD structure, used in NTFS file systems for tracking changes, might sometimes report larger extents of file modifications than what actually occurred. Here are simpler reasons:

    Sparse Files: If your file has empty or zeroed regions, the system may mistakenly report changes there.

    Fragmentation: If your file is scattered in different parts of the disk, changes from those parts might get combined into one report.

    Non-contiguous Changes: Changes happening in separate places within the file may be grouped together, causing larger reported extents.

    Metadata Changes: Changes to file details (like attributes or security info) can contribute to larger reported extents.

    Overhead from Applications: Some programs may interact with the file system in ways that make the reported extents bigger.

    To investigate, check for file fragmentation, sparse file regions, and how the file is being modified

    0 comments No comments

  2. Husnain Ali 0 Reputation points
    2023-12-01T23:14:10.5166667+00:00

    The USN_RECORD structure, used in NTFS file systems for tracking changes, might sometimes report larger extents of file modifications than what actually occurred. Here are simpler reasons:

    Sparse Files: If your file has empty or zeroed regions, the system may mistakenly report changes there.

    Fragmentation: If your file is scattered in different parts of the disk, changes from those parts might get combined into one report.

    Non-contiguous Changes: Changes happening in separate places within the file may be grouped together, causing larger reported extents.

    Metadata Changes: Changes to file details (like attributes or security info) can contribute to larger reported extents.

    Overhead from Applications: Some programs may interact with the file system in ways that make the reported extents bigger.

    To investigate, check for file fragmentation, sparse file regions, and how the file is being modified