Setup data source connection to connect data sources for data quality assessment
Článok
Data source connections set up the authentication needed to profile your data for statistical snapshot, or scan your data for data quality anomalies and scoring.
Setting up data source connections is the fourth step in the data quality life cycle for a data asset. Previous steps are:
You need at least read access to the data source for which you are setting up the connection.
Supported multicloud data sources
Azure Data Lake Storage Gen2
File Types: Delta Parquet and Parquet
Azure SQL Database
Fabric data estate in OneLake including shortcut and mirroring data estate. Data Quality scanning is supported only for Lakehouse delta tables and parquet files.
Mirroring data estate: Cosmos DB, Snowflake, Azure SQL
Shortcut data estate: AWS S3, GCS, AdlsG2
Azure Synapse serverless and data warehouse
Azure Databricks Unity Catalog
Snowflake
Google Big Query (Private Preview)
Currently, Microsoft Purview can only run data quality scans using Managed Identity as authentication option. Data Quality services run on Apache Spark 3.4 and Delta Lake 2.4.
In Unified Catalog, select Health management, then select Data quality.
Select a governance domain from the list.
From the Manage dropdown list, select Connections to open connections page.
Select New to create a new connection for the data products and data assets of your governance domain.
In the right panel, enter the following information:
Display name
Description
Select Source type, and select one of the data sources.
Depending on the data source, enter the access details.
If the test connection is successful, then Submit the connection configuration to complete the connection setup.
Prepitné
You can also create a connection to your resources using private endpoints and a Microsoft Purview Data Quality managed virtual network. For more information, see the managed virtual network article.
Connection setup steps varies for native connectors. Check the connection setup steps from native connectors documents to setup connection for Azure Databricsks, Snowflake, GoogBigQuery, and synapse connectors.
Grant Microsoft Purview permissions on the source
Now that the connection is created, to be able to scan data sources, your Microsoft Purview managed identity will need permissions on your data sources:
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.
Get an overview of data quality rules in Microsoft Purview Unified Catalog, and how you can use them to increase the quality and trustworthiness of your data.
This article provides information about how to manage data quality for an organization's critical data elements in the Microsoft Purview Unified Catalog.
This article gives an overview of how data quality stewards can monitor data quality profiling and scanning jobs in the Microsoft Purview Unified Catalog.