Hi All,
I'm struggling with cost optimisation for huge storage account and I'm looking for advice or workaround to avoid crazy cost for my operations.
Assume we can't change anything on application level, database nor folder structure.
Current status of storage:
- one storage account with around 50 TB (500 mln blobs) of data
- blob soft delete enabled (365 days)
- container soft delete enabled (30 days)
- lock for unexpected deletion enabled
- kind: storageV2
- performance: standard
- replication: RA-GRS
- tier hot
Also we have another mirrored storage which is backup (updated incrementrary with azcopy). We pay quite good amount of money for two storage accounts and all api call's during incremental copy to backup. Now I'm looking for safe cost saving solution.
My thoughts and doubts:
- Since we have RA-GRS replication type and lock for unexpected deletion enabled maybe it's not necessary to keep second storage as backup. It could save us half of our costs. But let's imagine situation that someone eventually deleted 'lock' and all data. Even if we can recovery it with this approach: https://learn.microsoft.com/en-us/azure/storage/common/storage-account-recover it does not guarantee 100% compatibility and it's time extensive. I know it's very unlikely situation but maybe Azure offers mechanism to avoid it and I'm not aware of it?
- Another solution comes to my mind is keep this structure as it is but play with Data Protection options and Lifecycle Management. It looks promising but I have problem with real estimation of initial costs and maintenance cost of this solution. let's assume the following change scenario:
backupstorage:
- replication down from GRS to Local (is it free operation or does it charge us fee for api call for each blob?)
- access tier down from hot to cold (or archive? maybe it's cheaper to move from archive to hot in emergency case instead of keeping cold tier all the time? changing tier on storage account level does affect existing blobs? if yes, it'll probably cost us a lot)
mainstorage:
- implement lifecycle management with few rules like: move blob to cold when last access etc. (but again, how does it work? do we pay for each check 'last access date'? how to estimate initial cost of this implementation? how long does it take for such amount of data?)
- enable versioning. Sometimes soft delete is not enough but on the same time we have bunch of blobs frequently changed (like every few sec). Can we manage somehow how much version do we keep or how log we store them? Is each file change and generating a new version a cost?
I'll be really appreciated to talk with someone who faced similar problem or someone who have more experience about backup management and cost optimization.