Solution ideas
This article describes a solution idea. Your cloud architect can use this guidance to help visualize the major components for a typical implementation of this architecture. Use this article as a starting point to design a well-architected solution that aligns with your workload's specific requirements.
When you migrate an on-premises mainframe or midrange application to Azure, transferring the data is a primary consideration. Several modernization scenarios require replicating files to Azure quickly or maintaining synchronization between on-premises files and Azure files.
This article describes several processes for transferring files to Azure, converting and transforming file data, and storing the data on-premises and in Azure.
Architecture
The following diagram shows some of the options for replicating and syncing on-premises files to Azure:
Download a Visio file of this architecture.
Dataflow
Transfer files to Azure:
The easiest way to transfer files on-premises or to Azure is by using File Transfer Protocol (FTP). You can host an FTP server on an Azure virtual machine (VM). A simple FTP job control language (JCL) sends files to Azure in binary format, which is essential to preserving mainframe and midrange computation and binary data types. You can store transmitted files in on-premises disks, Azure VM file storage, or Azure Blob Storage.
You can also upload on-premises files to Blob Storage by using tools like AzCopy.
The Azure Data Factory FTP/SFTP connector can also be used to transfer data from the mainframe system to Blob Storage. This method requires an intermediate VM on which a self-hosted integration runtime (SHIR) is installed.
You can also find third-party tools in Azure Marketplace to transfer files from mainframes to Azure.
Orchestrate, convert, and transform data:
Azure can't read IBM Extended Binary Coded Decimal Interchange Code (EBCDIC) code page files in Azure VM disks or Blob Storage. To make these files compatible with Azure, Host Integration Server (HIS) converts them from EBCDIC to American Standard Code for Information Interchange (ASCII) format.
Copybooks define the data structure of COBOL, PL/I, and assembly language files. HIS converts these files to ASCII based on the copybook layouts.
Before transferring data to Azure data stores, you might need to transform the data or use it for analytics. Data Factory can manage these extract-transform-load (ETL) and extract-load-transform (ELT) activities and store the data directly in Azure Data Lake Storage.
For big data integrations, Azure Databricks and Azure Synapse Analytics can perform all transformation activities fast and effectively by using the Apache Spark engine to perform in-memory computations.
Store data:
You can store transferred data in one of several available persistent Azure storage modes, depending on your requirements.
If there's no need for analytics, Azure Data Factory can store data directly in a wide range of storage options, such as Data Lake Storage and Blob Storage.
Azure hosts various databases, which address different needs:
- Relational databases include the SQL Server family, and open-source databases like PostgreSQL and MySQL.
- Non-relational databases include Azure Cosmos DB, a fast, multi-model, globally distributed NoSQL database.
Review analytics and business intelligence:
Microsoft Fabric is an all-in-one analytics solution that your organization can use to study data movement, experiment with data sciences, and review real-time analytics and business intelligence. It offers a comprehensive suite of features, including a data lake, data engineering, and data integration.
Components
Various file transfer, integration, and storage scenarios use different components. See the Azure pricing calculator to estimate costs for Azure resources.
Networking
An on-premises data gateway is bridge software that connects on-premises data to cloud services. You can install the gateway on a dedicated on-premises VM.
Data integration and transformation
Data Provider for Host Files is a component of HIS that converts EBCDIC code page files to ASCII. The provider can read and write records offline in a local binary file, or use Systems Network Architecture (SNA) or Transmission Control Protocol/Internet Protocol (TCP/IP) to read and write records in remote IBM z/OS mainframe datasets or i5/OS physical files. HIS connectors are available for BizTalk and Azure Logic Apps.
Azure Data Factory is a hybrid data integration service you can use to create, schedule, and orchestrate ETL and ELT workflows.
Azure Databricks is an Apache Spark-based analytics platform optimized for Azure. You can use Databricks to correlate incoming data, and enrich it with other data stored in Databricks.
Azure Synapse Analytics is a fast and flexible cloud data warehouse with a massively parallel processing (MPP) architecture that you can use to scale, compute, and store data elastically and independently.
Databases
Azure SQL Database is a scalable relational cloud database service. Azure SQL Database is evergreen and always up to date, with AI-powered and automated features that optimize performance and durability. Serverless compute and hyperscale storage options automatically scale resources on demand. With Azure Hybrid Benefit, you can use your existing on-premises SQL Server licenses on the cloud with no extra cost.
Azure SQL Managed Instance combines the broadest SQL Server database engine compatibility with all the benefits of a fully managed and evergreen platform as a service (PaaS). With SQL Managed Instance, you can modernize your existing apps at scale with familiar tools, skills, and resources.
SQL Server on Azure Virtual Machines lifts and shifts your SQL Server workloads to the cloud to combine the flexibility and hybrid connectivity of Azure with SQL Server performance, security, and analytics. You can access the latest SQL Server updates and releases with 100% code compatibility.
Azure Database for PostgreSQL is a fully managed relational database service based on the community edition of the open-source PostgreSQL database engine.
Azure Database for MySQL is a fully managed relational database service based on the community edition of the open-source MySQL database engine.
Azure Cosmos DB is a fully managed, multi-model NoSQL database service for building and modernizing scalable, high-performance applications. Azure Cosmos DB scales throughput and storage elastically and independently across geographic regions and guarantees single-digit-millisecond latencies at 99th percentile availability anywhere in the world.
Other data stores
Blob Storage stores large amounts of unstructured data, such as text or binary data, that you can access from anywhere via HTTP or HTTPS. You can use Blob Storage to expose data publicly or to store application data privately.
Data Lake Storage is a storage repository that holds a large amount of data in native, raw format. Data Lake Storage provides scaling for big data analytics workloads with terabytes and petabytes of data. The data typically comes from multiple heterogeneous sources, and might be structured, semi-structured, or unstructured.
Potential use cases
On-premises file replication and synchronization use cases include:
Downstream or upstream dependencies, for example if applications that run on a mainframe and applications that run on Azure need to exchange data via files.
Parallel testing of rehosted or re-engineered applications on Azure with on-premises applications.
Tightly coupled on-premises applications on systems that can't immediately be remediated or modernized.
Contributors
This article is maintained by Microsoft. It was originally written by the following contributors.
Principal authors:
- Ashish Khandelwal | Principal Engineering Architecture Manager
- Nithish Aruldoss | Engineering Architect
To see non-public LinkedIn profiles, sign in to LinkedIn.
Next steps
- For more information, contact Microsoft SQL Data Engineering team.
- Azure database migration guides