I'm trying to optimize data \and access in Azure Blob.

Question

I'm trying to optimize data \and access in Azure Blob.

ananya 20

I have a large number of small files that need to be accessed frequently, and I’m noticing increased costs and slower read performance. Would batching files into larger blobs improve efficiency, or is there a better approach? Also, what’s the best way to set up tiered storage for files that are used less frequently over time?

Accepted answer

1 additional answer

Your answer

Answer 1

Hi there

Storing a large number of small files in Azure Blob Storage can lead to higher costs and slower performance due to metadata overhead and frequent access requests. Batching files into larger blobs can improve read performance and reduce costs.

For tiered storage, use Lifecycle Management Policies to automatically move older, less-used files to Cool or Archive tiers, reducing costs without manual intervention. If frequent access patterns change, consider Azure Blob Indexer for efficient retrieval. These optimizations should help balance cost and performance!

If this helps kindly accept the answr thsanks much.

Answer 2

Without having an understanding of your app and what you define as small means it is going to be hard to say for sure. As with all things profiling your code would be best.

In general terms, if your app needs to load a lot of small files that rarely change then the best option is to load them once and cache them. This is what CDNs are designed for so ideally move any files that your app needs but rarely change behind a CDN and out of storage. If that isn't an option then cache the files on your app server instead. Costs can definitely add up if you're reading 100s of small files as you pay for the reads. The read performance is solely based upon network transfer speeds so if you're making 100s of API calls to get data from storage then that would definitely add up.

If all the files are related and you always need all of them then, if CDN isn't an option, you could batch the files together into a zip file and then store that. This is ideal if all the files are related and you would rarely update one without updating all of them. Storing them in a zip file reduces the # of calls to get the files but at the cost of having to download a larger file even if you only need a couple of the files. If the files aren't needed as a group then you could potentially be wasting bandwidth. Again, caching or a CDN can help solve this issue.

If the files change frequently then using a CDN, caching or even grouping them together isn't going to work well. Making a change to a single, small file and having to then retrieve, zip and upload all the files is going to waste a lot of bandwidth. Caching/CDN isn't going to be ideal either as the data would be invalidated if changed.

As for determining tiers, you should ideally keep infrequently needed blobs in a cold tier. This means it takes more to get them but saves on costs. You can configure this on a per-blob basis but more likely you should either let Azure manage it directly or set up some rules and automate the process yourself. Some general guidelines are given by Microsoft here. For example you might decide that a blob that hasn't been read in 2 weeks should be moved to cold tier. You can configure a policy for that. If the policies don't work for you then you can create your own archiving system by calling the APIs directly. However I suspect the existing policies should be good enough. Note that moving things between tiers impacts cost and speed of retrieval. As such they are best reserved for things like archived blobs. For blobs that the application needs frequent access to then archiving isn't really an option.

Share via

I'm trying to optimize data \and access in Azure Blob.

1 additional answer

Your answer