Data Archive Process in Azure Blob.

Nayan, Shakti 0 Reputation points
2023-04-06T09:47:30.43+00:00

We need to build a data archive process from on-premise to Azure cloud. The source systems are - Files (excel, CSV, Logs etc), RDBMS(MySQL,Oracle DB), Hadoop, HBase, Application Files (Qlikview, OBIEE). The main objective is to create an architecture to make the team understand the data flow and the overall process of data archiving. The archived data requires retrieval process, retention rules and later must be available to analytics applications for analysis. Please suggest how to go about this given that the data volume is in petabytes and what does the roadmap to this looks like?

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,192 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. KarishmaTiwari-MSFT 20,772 Reputation points Microsoft Employee Moderator
    2023-04-10T08:01:45.8933333+00:00

    @Nayan, Shakti Thanks for posting your query on Microsoft Q&A.

    You can use Azure StorSimple appliance running on-premises that can tier data to Azure Blob storage (both hot and cool tier). StorSimple can be used to archive data from on-premises to Azure. Alternatively, you might also consider using Azure File Sync. File Sync is a service that allows you to cache several Azure file shares on an on-premises Windows Server or cloud virtual machine (VM).

    To create an architecture to make the team understand the data flow and the overall process of data archiving, you can use Azure Architecture Center. Azure Architecture Center provides best practices, design patterns, reference architectures, and solutions for common workloads on Azure. You can use Azure Architecture Center to create a reference architecture for your data archive process that includes the source systems, data flow, and overall process of data archiving. To handle petabytes of data, you can use Azure Blob Storage to store the archived data. Azure Blob Storage can store up to 5 PB of data per account. You can also use Azure Data Factory to process and transform the data by using compute services such as HDInsight or Hadoop.

    Reference architecture here: https://learn.microsoft.com/en-us/azure/architecture/solution-ideas/articles/backup-archive-on-premises#architecture
    Architecture diagram that demonstrates how to archive on-premises data.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.