Data quality for databricks Unity Catalog databases
Článok
To use Unity Catalog, your Azure Databricks workspace must be enabled for Unity Catalog, which means that the workspace is attached to a Unity Catalog metastore. All new workspaces are enabled for Unity Catalog automatically upon creation, but older workspaces might require that an account admin enable Unity Catalog manually. Whether or not your workspace was enabled for Unity Catalog automatically, the following steps are also required to get started with Unity Catalog:
Create catalogs and schemas to contain database objects like tables and volumes.
Create managed storage locations to store the managed tables and volumes in these catalogs and schemas.
Grant user access to catalogs, schemas, and database objects.
Workspaces that are automatically enabled for Unity Catalog provision a workspace catalog with broad privileges granted to all workspace users. This catalog is a convenient starting point for trying out Unity Catalog.
Specify the Databricks SQL Warehouse’s HTTP path that Microsoft Purview will connect to and perform the scan
In Scope your scan page, select the catalogs you want to scan.
Select a scan rule set for classification. You can choose between the system default, existing custom rule sets, or create a new rule set inline. Check the Classification article to learn more.
For Scan trigger, choose whether to set up a schedule or run the scan once.
Review your scan and select Save and Run.
View your scans and scan run to complete cataloging your data.
Once scanned, the data asset in Unity Catalog (UC) will be available on Microsoft Purview Unified Catalog search. For more details about how to connect and manage Azure Databricks Unity Catalog in Microsoft Purview, follow this document.
Dôležité
Select Access Token Authentication while creating a credential.
Place Access Token on your hosted Azure Key Vault and connect the key vault to the connection manager.
Make sure to provide product (service) MSI read (secret) access to the Key Vault.
Set up connection to databricks UC for data quality scan
At this point we have the scanned asset ready for cataloging and governance. Associate the scanned asset to the Data Product in a Governance Domain Sele. At the Data Quality Tab, add a new Azure SQL Database Connection: Get the Database Name entered manually.
Select Data quality > Governance Domain > Manage tab to create connection.
Configure connection in the connection page.
Add connection name and description
select source type Azure Databricks
select workspace URL
select Unity catalog as extraction method
select HTTP path
select unity catalog name
select schema name
select table name
select authentication method - Access Token
Add Azure subscription
Key vault connection
secret name
secret version
Test connection
Dôležité
Data Quality stewards need read only access to Azure databrics Unity Catalog to setup data quality connection.
vNet is not supported yet.
Profiling and Data Quality scanning for data in Azure Databricks Unity Catalog databases.
After completed connection setup successfully, you can profile, create and apply rules, and run DQ scan of your data in Azure Databricks Unity Catalog databases. Follow the step-by-step guideline described in below documents:
This training module guides you in how to build a complete master data management and data governance stack end to end with Microsoft Purview and CluedIn. It includes developing golden records, deduplication, data lineage, and data quality strategies.
Administer an SQL Server database infrastructure for cloud, on-premises and hybrid relational databases using the Microsoft PaaS relational database offerings.