Mainframe and midrange data replication to Azure using Qlik

Azure Event Hubs
Azure Data Lake
Azure Databricks

This solution uses an on-premises instance of Qlik to replicate on-premises data sources to Azure in real time.

Note

Pronounce "Qlik" like "click".

Apache® and Apache Kafka® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Architecture

Architecture for data migration to Azure by using Qlik.

Download a Visio file of this architecture.

Workflow

  1. Host agent: The Host agent on the on-premises system captures change log information from Db2, IMS, and VSAM data stores, and passes it to the Qlik Replication server.
  2. Replication server: The Qlik Replication server software passes the change log information to Kafka and Azure Event Hubs. Qlik in this example is on-premises, but it could instead be deployed on a virtual machine in Azure.
  3. Stream ingestion: Kafka and Event Hubs provide message brokers to receive and store change log information.
  4. Kafka Connect: The Kafka Connect API is used to get data from Kafka for updating Azure data stores such as Azure Data Lake Storage, Azure Databricks, and Azure Synapse Analytics.
  5. Data Lake Storage: Data Lake Storage is a staging area for the change log data.
  6. Databricks: Databricks processes the change log data and updates the corresponding files on Azure.
  7. Azure data services: Azure provides a variety of efficient data storage services. Prominent among these are:
    • Relational databases services:

      • SQL Server on Azure Virtual Machines
      • Azure SQL Database
      • Azure SQL Managed Instance
      • Azure Database for PostgreSQL
      • Azure Database for MySQL
      • Azure Cosmos DB

      There are many factors to consider when choosing a data storage service: type of workload, cross-database queries, two-phase commit requirements, ability to access the file system, amount of data, required throughput, latency, and so on.

    • Azure non-relational database services: Azure Cosmos DB, a NoSQL database, provides quick response, automatic scalability, and guaranteed speed at any scale.

    • Azure Synapse Analytics: Synapse Analytics is an analytics service that brings together data integration, enterprise data warehousing, and big data analytics. With it, you can query data by using either serverless or dedicated resources at scale.

    • Microsoft Fabric: Microsoft Fabric is an all-in-one analytics solution for enterprises. It covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration.

Components

This architecture consists of several Azure cloud services and is divided into four categories of resources: networking and identity, application, storage, and monitoring. The services for each and their roles are described in the following sections.

Networking and identity

  • Azure ExpressRoute extends your on-premises networks into cloud services offered by Microsoft over a private connection from a connectivity provider. With ExpressRoute, you can establish connections to cloud services such as Microsoft Azure and Office 365.
  • Azure VPN Gateway is a specific type of virtual network gateway that sends encrypted traffic between an Azure virtual network and an on-premises location over the public internet.
  • Microsoft Entra ID is an identity and access management service that can synchronize with an on-premises active directory.

Application

  • Azure Event Hubs is a big data streaming platform and event ingestion service that can store Db2, IMS, and VSAM change data messages. It can receive and process millions of messages per second. Data sent to an event hub can be transformed and stored by using a real-time analytics provider or a custom adapter.
  • Apache Kafka is an open-source distributed event streaming platform that's used for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. It can be easily integrated with Qlik data integration to store Db2 change data.
  • Azure Data Lake Storage Azure Data Lake Storage provides a data lake for storing the processed on-premises change log data.
  • Azure Databricks is a cloud-based data engineering tool that's based on Apache Spark. It can process and transform massive quantities of data. You can explore the data by using machine learning models. Jobs can be written in R, Python, Java, Scala, and Spark SQL.

Storage

  • Azure Storage is a set of massively scalable and secure cloud services for data, apps, and workloads. It includes Azure Files, Azure Table Storage, and Azure Queue Storage. Azure Files is often an effective tool for migrating mainframe workloads.
  • Azure Cosmos DB is a fully managed NoSQL database service with open-source APIs for MongoDB and Cassandra. A possible application is to migrate mainframe non-tabular data to Azure.

Monitoring

  • Azure Monitor delivers a comprehensive solution for collecting, analyzing, and acting on telemetry from cloud and on-premises environments. It includes:
    • Application Insights, for analyzing and presenting telemetry.
    • Monitor Logs, which collects and organizes log and performance data from monitored resources. Data from different sources such as platform logs from Azure services, log and performance data from virtual machines agents, and usage and performance data from applications can be consolidated into a single workspace to be analyzed together. Analysis uses a sophisticated query language that's capable of quickly analyzing millions of records.
    • Log Analytics, which can query Monitor logs. A powerful query language lets you join data from multiple tables, aggregate large sets of data, and perform complex operations with minimal code.

Alternatives

  • The diagram shows Qlik installed on-premises, a recommended best practice to keep it close to the on-premises data sources. An alternative is to install Qlik in the cloud on an Azure virtual machine.
  • Qlik Data Integration can deliver directly to Databricks without going through Kafka or an event hub.
  • Qlik Data integration can't replicate directly to Azure Cosmos DB, but you can integrate Azure Cosmos DB with an event hub by using event-sourcing architecture.

Scenario details

Many organizations use mainframe and midrange systems to run demanding and critical workloads. Most applications use one or more databases, and most databases are shared by many applications, often on multiple systems. In such an environment, modernizing to the cloud means that on-premises data must be provided to cloud-based applications. Therefore, data replication becomes an important modernization tactic.

The Qlik Data Integration platform includes Qlik Replication, which does data replication. It uses change data capture (CDC) to replicate on-premises data stores in real time to Azure. The change data can come from Db2, IMS, and VSAM change logs. This replication technique eliminates inconvenient batch bulk loads. This solution uses an on-premises instance of Qlik to replicate on-premises data sources to Azure in real time.

Potential use cases

This solution might be appropriate for:

  • Hybrid environments that require replication of data changes from a mainframe or midrange system to Azure databases.
  • Online database migration from Db2 to an Azure SQL database with little downtime.
  • Data replication from various on-premises data stores to Azure for consolidation and analysis.

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.

Reliability

Reliability ensures your application can meet the commitments you make to your customers. For more information, see Overview of the reliability pillar.

  • Qlik Data Integration can be configured in a high-availability cluster.
  • The Azure database services support zone redundancy and can be designed to fail over to a secondary node in case of an outage or during a maintenance window.

Security

Security provides assurances against deliberate attacks and the abuse of your valuable data and systems. For more information, see Overview of the security pillar.

  • ExpressRoute provides a private and efficient connection to Azure from on-premises, but you could instead use site-to-site VPN.
  • Azure resources can be authenticated by using Microsoft Entra ID. Permissions can be managed by role-based access control.
  • Database services in Azure support various security options, such as:
    • Data Encryption at rest.
    • Dynamic data masking.
    • Always-encrypted database.
  • For general guidance on designing secure solutions, see the Azure Security Documentation.

Cost optimization

Cost optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Overview of the cost optimization pillar.

Use the Azure Pricing Calculator to estimate costs for your implementation.

Operational excellence

Operational excellence covers the operations processes that deploy an application and keep it running in production. For more information, see Overview of the operational excellence pillar.

  • You can combine Monitor's Application Insights and Log Analytics features to monitor the health of Azure resources. You can set alerts so that you can manage proactively.
  • For guidance on resiliency in Azure, see Designing reliable Azure applications.

Performance efficiency

Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. For more information, see Performance efficiency pillar overview.

Databricks, Data Lake Storage, and other Azure databases have auto-scaling capabilities. For more information, see Autoscaling.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal author:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps