Choose the right integration runtime configuration for your scenario

The integration runtime (IR) is the compute infrastructure that Microsoft Purview uses to power data scan across different network environments. This article introduces the different types of integration runtime available in Microsoft Purview, and provides guidance on how to choose the right integration runtime configuration for your scenario.

Types of integration runtimes

Microsoft Purview provides the following types of integration runtimes:

  • Azure integration runtime: The Azure integration runtime is a fully managed and elastic compute that you can use to scan Azure or non-Azure data sources. The Azure IR supports connections to data stores and compute services with publicly accessible endpoints. It's the default integration runtime that you don't need to create anything to get started.
  • Managed Virtual Network (VNet) integration runtime: You can create a Managed VNet integration runtime, which resides in a Microsoft Purview Managed Virtual Network. It can use private endpoints to securely connect to and scan the supported data sources. Learn more from Managed Virtual Network and managed private endpoints.
  • Self-hosted integration runtime: The self-hosted integration runtime can be used to scan data sources in an on-premises network or a virtual network. You can install it on an on-premises machine or a virtual machine inside your private network. Learn more from Create and manage Self-hosted Integration Runtimes.
  • Kubernetes supported self-hosted integration runtime (Preview): This integration runtime is hosted on a Kubernetes cluster and can be used to scan data sources in an on-premises network or a virtual network. Kubernetes support improves overall performance and allows the integration runtime to scale with the job. Learn more from Create and manage Kubernetes supported self-hosted integration runtimes
  • AWS integration runtime: The AWS integration runtime is a fully managed and elastic computed hosted by Microsoft Purview in AWS. It's applicable when scanning Amazon data sources like S3, RDS.

Choose the right integration runtime

It's important to choose an appropriate type of integration runtime. Not only must it be suitable for your existing architecture and requirements for data integration, but you also need to consider how to further meet growing business needs and any future increase in workload.

The following consideration can help you navigate the decision:

  1. What data source types do you want to scan?

    Check the supported data sources section to learn about the supported IR types for the data sources you want to scan.

  2. What’s the network access control on your data source?

    Different data source may have different network firewall settings to protect it from random access over the internet, may it be an on-premises or a cloud / SaaS data store. The following table lists some common firewall options. You can choose the supported IR type according to your scenario.

    Data source firewall Azure IR Managed VNet IR SHIR Kubernetes supported SHIR
    Allow public access
    Allow Azure service or trusted service
    Allow access from specific Azure virtual network ✓ (with managed private endpoint support)
    Allow specific IP / IP range
    Other on-premises or private network access
  3. What’s the firewall setting of your Microsoft Purview?

    Microsoft Purview provides different network firewall options. Learn more from Configure Microsoft Purview firewall. You can choose the supported IR type according to your scenario.

    Purview firewall Azure IR Managed VNet IR SHIR Kubernetes supported SHIR
    Enabled from all networks
    Disabled from all networks ✓ (managed private endpoint required) ✓ (need to create private endpoint from your network) ✓ (need to create private endpoint from your network)
  4. What level of security do you require during data transmission?

    The integration runtime location defines the location of its back-end compute and where the scan operations are performed. For data residency consideration:

    • When you use Azure IR, Microsoft Purview automatically detects data source's location and uses the IR in that region. If Microsoft Purview can't detect the region, it uses Purview account's region.
    • When you use Managed VNet IR, it runs in the region you configure for the managed virtual network.
    • When you use SHIR, you can fully decide the location in your on-premises or Azure virtual machines.

    To defend against, for example, man-in-the-middle attacks during data transmission, you can choose to use a Private Endpoint and Private Link to ensure data security.

    • You can create managed private endpoints to your data stores when using Managed VNet IR. The private endpoints are maintained by the Microsoft Purview service within the managed virtual network.
    • You can also create private endpoints in your virtual network and the SHIR can use them to access data stores.
  5. What level of maintenance are you able to provide?

    Maintaining infrastructure, servers, and equipment is one of the important tasks of the IT department of an enterprise. It usually takes a lot of time and effort.

    • When using Azure IR and Managed VNet IR, you don’t need to worry about the maintenance such as update, patch and version. The Microsoft Purview service takes care of all the maintenance efforts.
    • Because the SHIR is installed on your machines and the Kubernetes supported SHIR is on your Kubernetes clusters, you need to manage the maintenance.
  6. Performance and scalability

    We recommend you to use the fully managed and autoscaled Azure IR, Managed VNet IR, or the Kubernetes-supported self-hosted integration runtime whenever applicable. With the elasticity, they can provide you with better performance and scalability especially when scanning large-scale data systems.

Supported data sources

The table below shows all the data sources that are supported by Microsoft Purview scan, and the supported integration runtime types.

Category Supported data store Azure IR/AWS IR Managed VNet IR SHIR Kubernetes SHIR
Azure Multiple sources
Azure Blob Storage ✓ (including managed private endpoint)
Azure Cosmos DB (API for NoSQL) ✓ (including managed private endpoint)
Azure Data Explorer ✓ (v2 only)
Azure Data Lake Storage Gen1 ✓ (v2 only)
Azure Data Lake Storage Gen2 ✓ (including managed private endpoint)
Azure Database for MySQL ✓ (including managed private endpoint)
Azure Database for PostgreSQL ✓ (including managed private endpoint)
Azure Databricks Hive Metastore
Azure Databricks Unity Catalog ✓ (v2 only, including managed private endpoint)
Azure Dedicated SQL pool (formerly SQL DW) ✓ (including managed private endpoint)
Azure Files ✓ (including managed private endpoint)
Azure SQL Database ✓ (including managed private endpoint)
Azure SQL Managed Instance ✓ (including managed private endpoint)
Azure Synapse Analytics (Workspace) ✓ (including managed private endpoint)
Database Amazon RDS
Amazon Redshift
Cassandra ✓ (v2 only)
Db2
Google BigQuery
Hive Metastore Database
MongoDB
MySQL ✓ (v2 only, including managed private endpoint)
Oracle
PostgreSQL ✓ (v2 only)
SAP Business Warehouse
SAP HANA
Snowflake ✓ (v2 only)
SQL Server
SQL Server on Azure-Arc
Teradata
File Amazon S3
HDFS
Services and apps Dataverse ✓ (v2 only)
Erwin
Looker ✓ (v2 only)
Fabric ✓ (v2 only)
Power BI ✓ (v2 only)
Qlik Sense ✓ (v2 only)
Salesforce ✓ (v2 only)
SAP ECC
SAP S/4HANA
Tableau ✓ (v2 only)