This article describes an end-to-end modernization plan for mainframe and midrange data sources.
Apache®, Spark, and the flame logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Download a Visio file of this architecture.
The following dataflow outlines a process for modernizing a mainframe data tier. It corresponds to the preceding diagram.
The on-premises data gateway provides enhanced-security data transfer between mainframe on-premises data and Azure services.
The object conversion process extracts object definitions from source objects. The definitions are then converted into corresponding objects in the target data store.
- SQL Server Migration Assistant for Db2 migrates schemas and data from IBM Db2 databases to Azure databases.
- Managed Data Provider for Host Files converts objects by:
- Parsing COBOL and RPG record layouts, or copybooks.
- Mapping the copybooks to C# objects that .NET applications use.
- Third-party tools perform automated object conversion on non-relational databases, file systems, and other data stores.
Data is ingested and transformed.
a. FTP transfers mainframe and midrange file system datasets with single layouts and unpacked fields in binary format to Azure.
b. Mainframe data is converted. Mainframe and midrange systems store data on DASD or tape in EBCDIC format in these types of files:
- Indexed VSAM files
- Non-indexed GDG files
- Flat files
COBOL, PL/I, and assembly language copybooks define the data structure of these files. Data Provider for Host Files converts the data from EBCDIC to ASCII format based on the copybook layout.
An Azure Data Factory custom connector uses the Host File client component of Host Integration Server to convert mainframe datasets.
Host Integration Server integrates existing IBM host systems, programs, messages, and data with Azure applications.
A solution that's based on open-source software and is used in Azure Synapse Analytics converts data.
c. Relational database data is migrated.
IBM mainframe and midrange systems store data in relational databases like these:
These services migrate the database data:
- Data Factory uses a Db2 connector to extract and integrate data from the databases.
- SQL Server Integration Services handles various data ETL tasks.
d. Non-relational database data is migrated.
IBM mainframe and midrange systems store data in non-relational databases like these:
- IDMS, a network model database management system (DBMS)
- IMS, a hierarchical model DBMS
Third-party products integrate data from these databases.
Azure services like Data Factory and AzCopy load data into Azure databases and Azure data storage. You can also use third-party solutions and custom loading solutions to load data.
Data is moved to storage.
Azure provides many managed data storage solutions:
Azure services use the modernized data tier for computing, analytics, storage, and networking.
Client applications also use the modernized data tier.
- SQL Database is part of the Azure SQL family. It's built for the cloud and provides all the benefits of a fully managed and evergreen platform as a service. SQL Database also provides AI-powered automated features that optimize performance and durability. Serverless compute and Hyperscale storage options automatically scale resources on demand.
- Azure Database for PostgreSQL is a fully managed relational database service that's based on the community edition of the open-source PostgreSQL database engine.
- Azure Cosmos DB is a globally distributed multimodel NoSQL database.
- Azure Database for MySQL is a fully managed relational database service that's based on the community edition of the open-source MySQL database engine.
- Azure Database for MariaDB is a cloud-based relational database service. It's based on the MariaDB community edition database engine.
- SQL Managed Instance is an intelligent, scalable cloud database service that offers all the benefits of a fully managed and evergreen platform as a service. SQL Managed Instance has near-100% compatibility with the latest SQL Server Enterprise edition database engine. It also provides a native virtual network implementation that addresses common security concerns.
- Azure Data Lake Storage is a storage repository that holds large amounts of data in its native, raw format. Data lake stores are optimized for scaling to terabytes and petabytes of data. The data typically comes from multiple heterogeneous sources. It can be structured, semi-structured, or unstructured.
- An on-premises data gateway provides enhanced-security data transfer between mainframe on-premises data and Azure services. Typically, you install the gateway on a dedicated on-premises VM.
- Azure Virtual Machines provides on-demand, scalable computing resources. An Azure virtual machine (VM) provides the flexibility of virtualization but eliminates the maintenance demands of physical hardware. Azure VMs offer a choice of operating systems, including Windows and Linux.
- Azure Data Factory is a hybrid data integration service. In this solution, an Azure Data Factory custom connector uses the Host File client component of Host Integration Server to convert mainframe datasets. With minimal setup, you can use a custom connector to convert your mainframe dataset just as you'd use any other Azure Data Factory connector.
- AzCopy is a command-line utility that moves blobs or files into and out of storage accounts.
- SQL Server Integration Services is a platform for creating enterprise-level data integration and transformation solutions. You can use it to solve complex business problems by:
- Copying or downloading files.
- Loading data warehouses.
- Cleansing and mining data.
- Managing SQL Server objects and data.
- Host Integration Server technologies and tools enable you to integrate existing IBM host systems, programs, messages, and data with Azure applications. The Host File client component provides flexibility for data that's converted from EBCDIC to ASCII. For example, you can generate JSON/XML from the data that's converted.
- Azure Synapse brings together data integration, enterprise data warehousing, and big data analytics. The Azure Synapse conversion solution used in this architecture is based on Apache Spark and is a good candidate for large mainframe-dataset workload conversion. It supports a wide range of mainframe data structures and targets and requires minimal coding effort.
- SQL Server Migration Assistant for Db2 automates migration from Db2 to Microsoft database services. When it runs on a VM, this tool converts Db2 database objects into SQL Server database objects and creates those objects in SQL Server.
- Data Provider for Host Files is a component of Host Integration Server that uses offline, SNA, or TCP/IP connections.
- With offline connections, Data Provider reads and writes records in a local binary file.
- With SNA and TCP/IP connections, Data Provider reads and writes records stored in remote z/OS (IBM Z Series Mainframe) datasets or remote i5/OS (IBM AS/400 and iSeries systems) physical files. Only i5/OS systems use TCP/IP.
- Azure services provide environments, tools, and processes for developing and scaling new applications in the public cloud.
Modern data storage solutions like the Azure data platform provide better scalability and performance than mainframe and midrange systems. By modernizing your systems, you can take advantage of these benefits. However, updating technology, infrastructure, and practices is complex. The process involves an exhaustive investigation of business and engineering activities. Data management is one consideration when you modernize your systems. You also need to look at data visualization and integration.
Successful modernizations use a data-first strategy. When you use this approach, you focus on the data rather than the new system. Data management is no longer just an item on the modernization checklist. Instead, the data is the centerpiece. Coordinated, quality-oriented data solutions replace fragmented, poorly governed ones.
This solution uses Azure data platform components in a data-first approach. Specifically, the solution involves:
- Object conversion. Converting object definitions from the source data store to corresponding objects in the target data store.
- Data ingestion. Connecting to the source data store and extracting data.
- Data transformation. Transforming extracted data into appropriate target data store structures.
- Data storage. Loading data from the source data store to the target data store, both initially and continually.
Potential use cases
Organizations that use mainframe and midrange systems can benefit from this solution, especially when they want to achieve these goals:
- Modernize mission-critical workloads.
- Acquire business intelligence to improve operations and gain a competitive advantage.
- Remove the high costs and rigidity that are associated with mainframe and midrange data stores.
- When you use the Data Provider for Host Files client to convert data, turn on connection pooling to reduce connection startup time.
- When you use Data Factory to extract data, tune the performance of the copy activity.
These considerations implement the pillars of the Azure Well-Architected Framework, a set of guiding tenets that you can use to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.
Security provides assurances against deliberate attacks and the abuse of your valuable data and systems. For more information, see Overview of the security pillar.
- The on-premises data gateway provides a level of data protection during transfers from on-premises to Azure systems.
- When you use Data Provider for Host Files to convert data, follow the recommendations in Data Providers for Host Files security and protection.
Cost optimization is about reducing unnecessary expenses and improving operational efficiencies. For more information, see Overview of the cost optimization pillar.
- SQL Server Migration Assistant is a free, supported tool that simplifies database migration from Db2 to SQL Server, SQL Database, and SQL Managed Instance. SQL Server Migration Assistant automates all aspects of migration, including migration assessment analysis, schema and SQL statement conversion, and data migration.
- The Azure Synapse Spark-based solution is built from open-source libraries. It eliminates the financial burden of licensing conversion tools.
- Use the Azure pricing calculator to estimate the cost of implementing this solution.
Operational excellence covers the operations processes that deploy an application and keep it running in production. For more information, see Overview of the operational excellence pillar.
When you use an on-premises application gateway, be aware of the limits on read and write operations.
This article is maintained by Microsoft. It was originally written by the following contributors.
- Ashish Khandelwal | Principal Engineering Architect Manager
- Mick Alberts | Technical Writer
- Nithish Aruldoss | Engineering Architect
To see non-public LinkedIn profiles, sign in to LinkedIn.
Review the Azure Database Migration Guides. Contact Azure Data Engineering - Mainframe & Midrange Modernization for more information.
See these articles:
- IBM workloads on Azure
- Mainframe rehosting on Azure virtual machines
- Mainframe workloads supported on Azure
- Move mainframe compute to Azure