RocketĀ® Data Replicate and Sync (RDRS), formerly tcVISION, is a data replication solution developed by Rocket Software. RDRS provides an IBM mainframe integration solution for mainframe data replication, data synchronization, data migration, and change data capture (CDC) for several Azure data platform services.
RocketĀ® Data Replicate and Sync is a trademark of its company. No endorsement is implied by the use of this mark.
Architecture
Download a Visio file of this architecture.
Dataflow
The following dataflow corresponds to the previous diagram:
The RDRS data replication solution supports CDC from many mainframe-based databases, including IBM Db2, IBM Information Management System (IMS) DB, Adabas for Software AG, CA Datacom, and Computer Associates Integrated Data Management System (CA IDMS). RDRS provides log-based CDC agents to capture the change data on the record level. This log-based CDC has a minimal impact on production source databases.
RDRS supports CDC from Virtual Storage Access Method files.
A task starts on the mainframe. Started tasks, or STCs, are created on the mainframe as part of RDRS software installation. Two crucial STCs are:
The capture agent, which captures changed data from the source.
The apply agent, which uses database management system (DBMS)-specific APIs to efficiently write changed data to the target.
Note
For Db2 z/OS, RDRS also provides an agentless CDC solution via a Db2 user-defined type (UDT) that doesn't need STCs.
The open platform manager (OPM) serves as a replication server. This server contains utilities for automatic data mapping to generate metadata for sources and targets. It also contains the rule set to extract data from the source. The server transforms and processes the data for the target systems and writes the data into the targets. You can install this component on Linux, Unix, and Windows (LUW) operating systems.
The RDRS apply agent uses DBMS-specific APIs. These APIs efficiently implement real-time data changes in combination with CDC technology. The changes are applied from the source to the target Azure data services, which are the database and files.
RDRS supports direct streaming of the changed data into Azure Event Hubs or Kafka. Then Azure Logic Apps, a function, or a custom solution in the virtual machine (VM) processes these events.
The Azure data platform targets that RDRS supports include Azure SQL Database, Azure Database for PostgreSQL, Azure Database for MySQL, Azure Cosmos DB, and Azure Data Lake Storage.
Data that lands in the Azure data platform is consumed by Azure services or other platforms that are permitted to see it. These platforms include Power BI, Azure Synapse Analytics, or custom applications.
RDRS can reverse synchronize capture changes from an Azure database platform such as SQL Database, Azure Database for MySQL, Azure Database for PostgreSQL, or Data Lake Storage. RDRS can then write those changes back to the mainframe data tier.
The mainframe database backup and unload files are copied to an Azure VM by using RDRS for bulk-load processing.
The RDRS bulk load performs an initial target database load by using mainframe source data. The source data can be read either directly from the mainframe data store or from a mainframe backup or unload file. The bulk load process automatically translates mainframe data types, such as extended binary coded decimal interchange code-packed fields. For optimal performance, use backup or unload data instead of reading the mainframe database directly. Avoid direct reads because moving unload or backup data to the requisite RDRS Azure VM and using native database loaders minimizes network input/output and reduces load times.
Change data replication from Db2 z/OS to a Microsoft Fabric native SQL database by using RDRS
The following architecture provides an overview of how data is replicated from Db2 z/OS to a Fabric native SQL database in near real time.
Download a Visio file of this architecture.
Initial data load
Db2 installed on an IBM Mainframe in the customer's datacenter serves as the source of data for replication to the Azure cloud.
To create a full copy, the RDRS capture agent fetches Db2 data by performing SELECT queries on the source Db2 database. If the data size is large, an image copy backup of the data can be sent from the mainframe to the Capture LUW VM in binary format.
The OPM serves as a replication server. This server contains utilities for automatic data mapping to generate metadata for sources and targets. It contains the rule set for extracting the data from the source. The server transforms and processes the data for the target systems and writes the data into the targets. You can install this component in LUW operating systems.
The RDRS capture and apply agent receives data from Db2, either as the output of SELECT queries or an image copy. After the RDRS apply agent performs the configured transformations, it writes the data to the target Fabric native SQL database.
The RDRS apply agent uses the Microsoft ODBC Driver with Microsoft Entra ID authentication for Azure SQL to efficiently write data to the target Fabric native SQL database.
Data is ingested into the Fabric native SQL database.
After data lands in the Fabric native SQL database, Azure services or other authorized entities consume it, such as Fabric Analytics, Power BI, or custom applications.
CDC
A. Db2 installed on an IBM Mainframe in the customer datacenter serves as the source of data for replication to the Azure cloud. RDRS provides the capability to retrieve log-based change data from Db2.
B. RDRS defines the Db2 UDT process to read Db2 logs. The UDT runs in the IBM Workload Manager environment and is managed by the Db2 DBMS. The UDT reads log data and stores this data in memory for transmission.
C. The OPM serves as a replication server, equipped with utilities for automatic data mapping to generate metadata for sources and targets. It includes rule sets for extracting data from the source, transforms and processes the data for target systems, and writes it to the targets. You can install this component on LUW operating systems. The RDRS capture and apply agent receives data from the UDT process. After the apply agent configures transformations, it writes the data to the target Fabric SQL database.
D. The RDRS dashboard interface enables the administration, operation, control, and monitoring of data exchange processes. The RDRS command-line utilities help automate data exchange processes and manage the unattended operations of the data synchronization process.
E. The RDRS apply agent uses the Microsoft ODBC Driver with Microsoft Entra ID authentication for Azure SQL to perform data manipulation language queries on the target Fabric native SQL database.
F. After data lands in the Fabric native SQL database, Azure services or other authorized entities consume it, including Fabric Analytics, Power BI, or custom applications.
G. RDRS also provides capabilities to write captured data as JSON to Event Hubs or Kafka.
H. Event Hubs serves as a storage platform for CDC data messages.
I. Logic Apps, Azure Functions, or an infrastructure as a service-based custom logic solution in an Azure VM can consume messages from Event Hubs to perform custom processing.
Components
This solution uses the following components.
Networking and identity components
This architecture refers to the following networking services that you can use individually or in combination for enhanced security.
Azure ExpressRoute is a service that allows you to extend your on-premises networks into the Microsoft Cloud over a private connection that a connectivity provider handles. You can use ExpressRoute to establish highly secure and reliable connections to cloud services such as Microsoft Azure and Microsoft 365.
An Azure VPN gateway is a specific type of virtual network gateway that sends encrypted traffic between an Azure virtual network and an on-premises location over the public internet.
Microsoft Entra ID is an identity and access management service that you can synchronize with an on-premises directory.
Application components
Logic Apps creates and runs automated recurring tasks and processes on a schedule. You can call services inside and outside of Azure, like HTTP or HTTPS endpoints, post messages to Azure services like Azure Storage and Azure Service Bus, or upload files to a file share.
Azure Functions is a cloud service that enables you to run small pieces of code, known as functions, without the need to manage or configure the underlying application infrastructure. You can use Azure Functions to automate tasks, process data, integrate systems, and build scalable applications. The cloud infrastructure provides the up-to-date servers that you need to keep your application running at scale.
Azure VMs are on-demand, scalable computing resources. An Azure VM provides the flexibility of virtualization and eliminates the maintenance demands of physical hardware. Azure VMs operate on Windows and Linux systems.
Storage and database components
This architecture discusses the data migration to scalable, more secure cloud storage and managed databases for flexible, intelligent data management in Azure.
Storage provides unmanaged storage solutions like Azure Blob Storage, Azure Table Storage, Azure Queue Storage, and Azure Files. Azure Files is especially useful for re-engineered mainframe solutions and provides an effective add-on with managed SQL storage.
Azure SQL is a fully managed platform as a service for SQL Server on Azure. You can migrate relational data and use it efficiently with other Azure components, including Azure SQL Managed Instance, Azure SQL VMs, Azure Database for PostgreSQL, and Azure Database for MySQL.
Azure Cosmos DB is a no-SQL offering that you can use to migrate nontabular data off of the mainframe.
The SQL database in Fabric is the primary platform that supports online transaction processing workloads and provides simplicity that makes setup and management easy. It has a system that automatically replicates data into OneLake in near real time, which makes it ideal for analytics tasks. It's integrated with development frameworks and analytics tools. This integration helps ensure compatibility and flexibility for various applications. The SQL database in Fabric lets you run queries the same way as SQL Database and includes a web-based editor that's accessible through the Fabric portal.
Monitoring components
Azure Monitor delivers a comprehensive solution for collecting, analyzing, and acting on telemetry from cloud and on-premises environments.
Application Insights analyzes and presents application telemetry.
Azure Monitor Logs is a feature of Monitor that collects and organizes log and performance data from monitored resources. You can consolidate data from multiple sources, like platform logs from Azure services, log and performance data from VM agents, and usage and performance data from applications, into a single workspace to be analyzed together by using a sophisticated query language that can quickly analyze millions of records.
Log Analytics is a tool in the Azure portal. You can use log queries to get insights from the data collected in Azure Monitor Logs. Log Analytics uses a powerful query language so that you can join data from multiple tables, aggregate large data sets, and perform complex operations with minimal code.
Scenario details
Mainframes are servers that process a large number of transactions. Mainframe applications produce and consume large amounts of data every day. Public clouds provide elasticity, cost optimization, ease of use, and easy integration. Many x86 and mainframe applications are moving to the cloud, so organizations must have a well-designed mainframe-to-cloud data integration and migration strategy.
This scenario integrates an IBM Z mainframe data tier with the Azure cloud data platform by using RDRS that Rocket Software provides.
Potential use cases
This solution is ideal for large-scale data migrations to the Azure data platform. Consider this scenario for the following use cases:
Full migration of a mainframe data tier: In this use case, a customer wants to move all their Db2, IMS, IDMS, files, and other data from a mainframe to the Azure data platform.
Coexistence of mainframe and Azure-based applications: In this use case, a customer requires support for a bidirectional synchronization between a mainframe and the Azure data platform.
Archival: In this use case, a customer wants to store data for audit and compliance purposes but doesn't want to access this data frequently. Storage provides a low-cost solution to store archive data.
Considerations
These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that you can use to improve the quality of a workload. For more information, see Well-Architected Framework.
Reliability
Reliability helps ensure that your application can meet the commitments that you make to your customers. For more information, see Design review checklist for Reliability.
Set up the RDRS OPM on Azure VMs that are deployed in separate availability zones to provide high availability. If a failure occurs, a secondary RDRS OPM is activated and communicates its IP address to the RDRS mainframe manager. The mainframe then communicates with the new RDRS OPM that continues to process at its next logical restart point by using a combination of logical unit of work and restart files.
Design Azure database services to support zone redundancy so that they can fail over to a secondary node if there's an outage or a planned maintenance window.
Use Azure Monitor Logs and Application Insights to monitor the health of an Azure resource. You can set alerts for proactive management.
Security
Security provides assurances against deliberate attacks and the misuse of your valuable data and systems. For more information, see Design review checklist for Security.
Control authentication and access for RDRS by using Microsoft Entra ID.
Encrypt data transfers between RDRS products, like transfers from mainframe to Azure, by using Transport Layer Security (TLS).
Use ExpressRoute or a site-to-site VPN for a more private and efficient connection to Azure from an on-premises environment.
Authenticate Azure resources by using Microsoft Entra ID and manage permissions by using role-based access control.
Use the database services in Azure to support various security options like Transparent Data Encryption for data at rest, TLS for data in transit, and data encryption while processing to help ensure that your data is always encrypted. For more information, see Azure security documentation and Security baselines for Azure.
Cost Optimization
Cost Optimization focuses on ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Design review checklist for Cost Optimization.
Use the Azure pricing calculator to estimate the cost of implementing this solution.
Performance Efficiency
Performance Efficiency refers to your workload's ability to scale to meet user demands efficiently. For more information, see Design review checklist for Performance Efficiency.
Scalability
Set up RDRS scaling for CDC processing by running multiple parallel replication streams. First analyze the files included in logical transactions. These files must be processed together in sequence. The RDRS CDC process helps ensure the integrity of each logical transaction. For instance, sets of tables that don't participate in common transactions might be divided into parallel tasks by creating multiple processing scripts.
RDRS can run parallel concurrent bulk-load processing simultaneously on a single Azure VM or on multiple Azure VMs, which provides horizontal scalability. Perform fast bulk load operations for large tables by splitting the process into multiple tasks, either by using arbitrary intervals or row filtering. Row filtering can use a key, partition key, date, and other filters.
The SQL Database serverless compute tier provides an automatic scaling option based on the workload. Other Azure databases can be scaled up and scaled down by using automation to meet the workload demands. For more information, see Autoscaling best practices in Azure.
Contributors
Microsoft maintains this article. The following contributors wrote this article.
Principal authors:
- Sandip Khandelwal | Senior Engineering Architect
Other contributors:
- Liz Casey | Senior Content Developer
To see nonpublic LinkedIn profiles, sign in to LinkedIn.
Next steps
- Azure database migration guides
- Migration guide: SQL Server to Azure SQL Database
- Training: Architect a data platform in Azure
- Training: Design a SQL Server migration strategy