Avoid fragmentation while writing on a sparse file by allocating space upfront

Kanak Agrawal 6 Reputation points
2023-08-07T11:23:06.6866667+00:00

I've an application which writes a single file of size 1 TB on a NTFS volume. The writes to this are not done sequentially. There are multiple threads which writes to different offset of the file. It is guaranteed that all the regions in file will be written by the application. In this, if a thread tries to write at the end of file, the program gets stuck for a while. This is because windows tries to backfill zeros in all the "unwritten" area of the file before that.

As a workaround of this problem, I marked the file as sparse before doing any writes using IOCTL call -https://learn.microsoft.com/en-us/windows/win32/api/winioctl/ni-winioctl-fsctl_set_sparse

After this, there is no backfill of zeros done by windows and the program runs faster. But, with using sparse file and random writes, there is a lot of fragmentation. On running contig for this file, I'm getting 1085463 fragments. But on some runs, the number of fragments becomes more than 1.5 million and file sync call fails with this error - "The requested operation could not be completed due to a file system limitation"


Contig v1.83 - Contig
Copyright (C) 2001-2023 Mark Russinovich
Sysinternals

D:\data\db1.mdf is in 1085463 fragments

Summary:
     Number of files processed:      1
     Number unsuccessfully procesed: 0
     Average fragmentation       : 1.08546e+06 frags/file
PS C:\Users\Administrator\Downloads\Contig>


The application is doing writes of 512 KB size. Assuming each write call is out of order and creates a new fragment, it is possible that after 512KB*1500000 = 732 GB file writes, the limit is reached.

Is there a way I can tell windows to preallocate space for spare file so that there is less fragmentation?
Or if not with sparse file, is it possible to do random writes on file without backfilling zeros?

Windows for business Windows Client for IT Pros User experience Other
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Limitless Technology 44,746 Reputation points
    2023-08-08T15:28:21.68+00:00

    Hello there,

    Allocating space upfront in a sparse file is indeed a good strategy to avoid fragmentation and ensure efficient storage utilization. Sparse files are files that have large empty sections, but they don't actually consume physical storage for those empty sections until data is written to them. This can lead to fragmentation over time if not managed properly.

    To allocate space upfront and minimize fragmentation while writing to a sparse file, follow these steps:

    Pre-calculate Size: Determine the total size that the sparse file will eventually be. This involves estimating the maximum amount of data you expect to write to the file over its lifetime.

    Initial Allocation: When you create the sparse file, allocate the entire space you pre-calculated. This can be done using system-specific file creation or allocation functions, depending on the programming language or system you're using. This essentially tells the file system to reserve the required space for the file upfront.

    Fill with Zeros: After creating the file, you can write zeros or any other suitable filler data to the entire file. This step is crucial to ensure that the allocated space is reserved on disk, as many file systems won't allocate physical space until actual data is written to the file.

    Write Data: Now, as you write actual data to the file, the file system will only allocate physical storage for the data you write, without causing fragmentation, since the space was already reserved.

    Update File Size: Keep track of the actual amount of data you've written to the file. This can be useful for your application's internal management and to accurately represent the file size to users.

    Truncate if Necessary: If you find that you've significantly overestimated the required size, and the file has a lot of unallocated space, you might consider truncating the file to the actual size of the written data. This can be done using system-specific file truncation functions.

    I used AI provided by ChatGPT to formulate part of this response. I have verified that the information is accurate before sharing it with you.

    Hope this resolves your Query !!

    --If the reply is helpful, please Upvote and Accept it as an answer--

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.