Choose the right integration runtime configuration for your scenario

The integration runtime (IR) is the compute infrastructure that Microsoft Purview uses to power data scan across different network environments. This article introduces the different types of integration runtime available in Microsoft Purview, and provides guidance on how to choose the right integration runtime configuration for your scenario.

Types of integration runtimes

Purview provides the following types of integration runtimes:

  • Azure integration runtime: The Azure integration runtime is a fully managed and elastic compute that you can use to scan Azure or non-Azure data sources. The Azure IR supports connections to data stores and compute services with publicly accessible endpoints. It's the default integration runtime that you don't need to create anything to get started.

  • Managed Virtual Network integration runtime: You can create a Managed Virtual Network integration runtime, which resides in a Purview Managed Virtual Network. It can use private endpoints to securely connect to and scan the supported data sources. Learn more at Managed Virtual Network and managed private endpoints.

  • Self-hosted integration runtime: The self-hosted integration runtime can be used to scan data sources in an on-premises network or a virtual network. You can install it on an on-premises machine or a virtual machine inside your private network. Learn more at Create and manage self-hosted integration runtimes.

  • Kubernetes supported self-hosted integration runtime: This integration runtime is hosted on a Kubernetes cluster and can be used to scan data sources in an on-premises network or a virtual network. Kubernetes support improves overall performance and allows the integration runtime to scale with the job. Learn more at Create and manage Kubernetes supported self-hosted integration runtimes.

  • AWS integration runtime: The AWS integration runtime is a fully managed and elastic computed hosted by Microsoft Purview in AWS. It's applicable when scanning Amazon data sources like S3, RDS.

Choose the right integration runtime

Choose the right integration runtime for your needs. Consider your existing architecture and requirements for data integration. Also think about how to meet growing business needs and any future increase in workload.

The following considerations can help you make your decision:

  1. What data source types do you want to scan?

    Check the supported data sources section to learn about the supported IR types for the data sources you want to scan.

  2. What’s the network access control on your data source?

    Different data sources have different network firewall settings to protect them from random access over the internet. These settings apply to on-premises, cloud, and SaaS data stores. The following table lists some common firewall options. Choose the supported IR type according to your scenario.

    Data source firewall Azure IR Managed Virtual Network IR SHIR Kubernetes supported SHIR
    Allow public access
    Allow Azure service or trusted service
    Allow access from specific Azure virtual network ✓ (with managed private endpoint support)
    Allow specific IP / IP range
    Other on-premises or private network access
  3. What’s the firewall setting of your Microsoft Purview?

    Purview provides different network firewall options. Learn more from Configure Microsoft Purview firewall. Choose the supported IR type according to your scenario.

    Purview firewall Azure IR Managed Virtual Network IR SHIR Kubernetes supported SHIR
    Enabled from all networks
    Disabled from all networks ✓ (managed private endpoint required) ✓ (need to create private endpoint from your network) ✓ (need to create private endpoint from your network)
  4. What level of security do you require during data transmission?

    The integration runtime location defines the location of its back-end compute and where the scan operations are performed. For data residency consideration:

    • When you use Azure IR, Purview automatically detects the data source's location and uses the IR in that region. If Purview can't detect the region, it uses the Purview account's region.

    • When you use Managed Virtual Network IR, it runs in the region you configure for the managed virtual network.

    • When you use SHIR, you can fully decide the location in your on-premises or Azure virtual machines.

      To protect against, for example, man-in-the-middle attacks during data transmission, use a private endpoint and private link to ensure data security.

    • You can create managed private endpoints to your data stores when using Managed Virtual Network IR. The Purview service maintains the private endpoints within the managed virtual network.

    • You can also create private endpoints in your virtual network and the SHIR can use them to access data stores.

  5. What level of maintenance are you able to provide?

    Maintaining infrastructure, servers, and equipment is one of the important tasks of the IT department of an enterprise. It usually takes much time and effort.

    • When using Azure IR and Managed Virtual Network IR, you don't need to worry about maintenance such as updates, patches, and versions. The Purview service takes care of all the maintenance efforts.
    • Because the SHIR is installed on your machines and the Kubernetes supported SHIR is on your Kubernetes clusters, you need to manage the maintenance.
  6. Performance and scalability

    Use the fully managed and autoscaled Azure IR, Managed Virtual Network IR, or the Kubernetes-supported self-hosted integration runtime whenever applicable. By using elasticity, they can provide you with better performance and scalability, especially when scanning large-scale data systems.

Hibernation of managed virtual network integration runtime

If the integration runtime is inactive (no scans on the integration runtime for more than 90 days), your Managed Virtual Network Integration Runtime automatically goes into hibernation. Its status shows as Hibernated when you select the integration runtime.

What this change means for you

  1. When you run Test Connection on a hibernated integration runtime, the test connection fails. You see a message to try Test Connection after 15 minutes. By this time, your Managed Virtual Network returns to a normal state. After this, you can run your Test Connections and Scans normally.

  2. When you run a scan directly by using Run scan now or Edit Scan options without running a Test Connection first from a hibernated Integration Runtime, or you run a scan through API, you see a message that this scan takes up to 15 extra minutes. This extra time is for the hibernated Integration Runtime to wake up and the scan process to begin. You see your scan status as Queued_Waking Up IR instead of Queued state that you see in case of normal scan. After the first scan, you can run all your following scans normally.

Supported data sources

The following table shows all the data sources that Purview scan supports, and the supported integration runtime types.

Category Supported data store Azure IR/AWS IR Managed Virtual Network IR SHIR Kubernetes SHIR
Azure Multiple sources
Azure Blob Storage ✓ (including managed private endpoint)
Azure Cosmos DB (API for NoSQL) ✓ (including managed private endpoint)
Azure Data Explorer ✓ (v2 only)
Azure Data Lake Storage Gen1 ✓ (v2 only)
Azure Data Lake Storage Gen2 ✓ (including managed private endpoint)
Azure Database for MySQL ✓ (including managed private endpoint)
Azure Database for PostgreSQL ✓ (including managed private endpoint)
Azure Databricks Hive Metastore
Azure Databricks Unity Catalog ✓ (v2 only, including managed private endpoint)
Azure Dedicated SQL pool (formerly SQL DW) ✓ (including managed private endpoint)
Azure Files ✓ (including managed private endpoint)
Azure SQL Database ✓ (including managed private endpoint)
Azure SQL Managed Instance ✓ (including managed private endpoint)
Azure Synapse Analytics (Workspace) ✓ (including managed private endpoint)
Database Amazon RDS
Amazon Redshift
Cassandra ✓ (v2 only)
Db2
Google BigQuery
Hive Metastore Database
MongoDB
MySQL ✓ (v2 only)
Oracle
PostgreSQL ✓ (v2 only)
SAP Business Warehouse
SAP HANA
Snowflake ✓ (v2 only, including managed private endpoint)
SQL Server
SQL Server on Azure-Arc
Teradata
File Amazon S3
HDFS
Services and apps Dataverse ✓ (v2 only)
Erwin
Looker ✓ (v2 only)
Fabric ✓ (v2 only)
Power BI ✓ (v2 only)
Qlik Sense ✓ (v2 only)
Salesforce ✓ (v2 only)
SAP ECC
SAP S/4HANA
Tableau ✓ (v2 only)