Share via

Linux VMs and Blob Storage

Tignor, Tom 21 Reputation points
2022-01-03T21:43:58.603+00:00

We are considering using Azure to host a file delivery system. In our design, we would store hundreds of TB of immutable data files as blobs in containers named to allow prefix searches, something like {partitionID}-{network}-{YYYY}-{MM}-{DD}-{HH}. A group of delivery processing hosts running on Azure (Ubuntu) VMs would periodically search over subsets of recent containers and deliver blobs found in those containers. An upstream service (outside Azure) would periodically create new data file blobs with auto-expiration policies TBD. How would you recommend to create this system in Azure? Should we purchase a blob storage account and a preset quantity of Linux VMs, or is there some more suitable Azure package? Are there differences in how Azure VMs would access Azure blob storage vs. access by external hosts? Which resources if any would we need to maintain ourselves to insure Azure blob storage is highly available for Azure VMs?

Azure Blob Storage
Azure Blob Storage

An Azure service that stores unstructured data in the cloud as blobs.

0 comments No comments

Answer accepted by question author

Anonymous
2022-01-04T19:26:46.21+00:00

@Tignor, Tom
It is not necessary to purchase a package to host your solution. Both blob storage and Azure VMs can be paid based on usage without reservations, I would recommend figuring out your computing and storage needs prior to purchasing any reservations. Checkout the blob storage pricing and VM pricing page for more information. You might also find the Customer Enablement page helpful on getting started.

Rather than doing prefix searches you might consider using blob index tags. Blob index tags provide data management and discovery capabilities by using key-value index tag attributes. You can categorize and find objects within a single container or across all containers in your storage account. Information on setting up immutable blobs can be found here.

Accessing blob storage through your Azure VMs can be done using the primary endpoint or by creating a private endpoint. More information on networking and firewalls for Azure storage can be found here.

Azure Storage always stores multiple copies of your data so that it is protected from planned and unplanned events, including transient hardware failures, network or power outages, and massive natural disasters. Redundancy ensures that your storage account meets its availability and durability targets even in the face of failures. If you wish to have the capability to access your data in case of a region failure or issue you can consider enabling read-access geo-redundant storage (RA-GRS) or read-access geo-zone-redundant storage (RA-GZRS). You can read more about redundancy and failover access here.

Hope this helps in getting you started. Let us know if you have any questions or issues.

-------------------------------

Please don’t forget to "Accept the answer" and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Was this answer helpful?

0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.