Considerations for temporary storage of files in Azure Files or Blob

Russell Durham 25 Reputation points
2024-09-19T19:53:12.5266667+00:00

Scenario:

I have an input file that needs to be copied to a temporary workspace directory for a data processing pipeline. Multiple services will read that file, create intermediate files within that workspace, and some of those will be used by downstream services. After the job is complete, some of the files in the workspace are copied to a more permanent location while the rest of the workspace is deleted. The services are hosted in docker containers and run on a k8s cluster in Azure, with the directories mounted that are backed by either Azure Files or Azure Blob.

These input files can be gigs in size and the intermediate files will match that in some cases. You can assume 2-4 intermediate files. Additionally, this will likely happen in bursts with the beginning of the month seeing the majority of the input file uploads from hundreds of clients.

Question:

My question is regarding the location of the temporary workspace when it comes to Azure Files or Blob storage. Azure Files seems like a better choice because the cost per 10k iops is cheaper than blob storage (Assuming transaction tier for Azure Files and hot for blob). The storage cost is higher but with them being temporary workspaces that wouldn't be an issue would it? One of the things I'm a bit fuzzy on is how is "capacity" calculated in this type of scenario.

As with most of us, I don't know what I don't know so if you feel you need additional information on the scenario don't hesitate to ask and I'll do my best to respond.

Azure Files
Azure Files
An Azure service that offers file shares in the cloud.
1,293 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,875 questions
0 comments No comments
{count} votes

Accepted answer
  1. Vinodh247 21,881 Reputation points
    2024-09-20T05:48:03.8066667+00:00

    Hi Russell Durham,

    Thanks for reaching out to Microsoft Q&A.

    In your scenario, where temporary workspaces are used for file storage during data processing, both Azure Files and Blob Storage offer advantages. Here's a breakdown of considerations that you can look at:

    Azure Files:

    Cost Efficiency for IOPS: As you noted, Azure Files in the transaction-optimized tier is often cheaper per 10K iops than Blob in the Hot tier. For workloads with high I/O demand (such as reading/writing large input files and generating intermediate files), this makes Azure Files an attractive option.

    File System-Like Interface: Since you are mounting directories in Kubernetes, Azure Files may be more seamless, providing a full SMB/NFS file share experience, which can be important if you need POSIX compliant file operations or more complex file structure management.

    Capacity Calculation: For Azure Files, the capacity (storage cost) is calculated based on the size of the files stored in the share. Since you are working with temporary files and cleaning up after processing, storage cost would primarily be tied to the peak workspace utilization and could be minimized if workspaces are efficiently managed and deleted.

    Azure Blob Storage:

    Optimized for Object Storage: While Blob storage is not as well suited for file system operations, it excels for storing unstructured data in bulk. For intermediate files or large input files that don't require file-level manipulation (just object access), Blob may be preferable.

    Lower Storage Costs: Blob storage offers lower per GB storage costs in the Hot tier. If you have large files and access patterns that favor sequential access (without needing granular file operations), Blob could result in lower overall storage costs.

    Capacity Calculation: For Blob, capacity is straightforward, it is simply based on the amount of data stored, regardless of how frequently files are accessed (although frequent access does result in additional transaction costs).

    Considerations for Your Scenario:

    Access Patterns: If your services need file level operations, metadata handling, or compatibility with file shares, Azure Files will likely be more straightforward. For bursty, large file processing, Azure Files will handle high I/O workloads efficiently.

    Performance Needs: Azure Files can provide the IOPS you need but ensure that the appropriate performance tier (transaction optimized or premium) is chosen to meet your workload's performance and concurrency requirements.

    Cost Efficiency: The storage cost might not be a concern if these workspaces are cleaned up after processing, so IOPS performance could be a higher priority.

    Capacity: Whether using Azure Files or Blob, capacity is calculated based on the total amount of data stored in the temporary workspace. For Azure Files, the primary factor is the maximum size reached during processing bursts.

    Given the size of your files (GBs) and the burst nature of your workload, Azure Files in the transaction optimized tier seems like a more suitable choice due to its file system interface and better IOPS pricing compared to Blob storage in the hot tier.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.