Data quality for databricks Unity Catalog databases
Чланак
To use Unity Catalog, your Azure Databricks workspace must be enabled for Unity Catalog, which means that the workspace is attached to a Unity Catalog metastore. All new workspaces are enabled for Unity Catalog automatically upon creation, but older workspaces might require that an account admin enable Unity Catalog manually. Whether or not your workspace was enabled for Unity Catalog automatically, the following steps are also required to get started with Unity Catalog:
Create catalogs and schemas to contain database objects like tables and volumes.
Create managed storage locations to store the managed tables and volumes in these catalogs and schemas.
Grant user access to catalogs, schemas, and database objects.
Workspaces that are automatically enabled for Unity Catalog provision a workspace catalog with broad privileges granted to all workspace users. This catalog is a convenient starting point for trying out Unity Catalog.
Specify the Databricks SQL Warehouse’s HTTP path that Microsoft Purview will connect to and perform the scan
In Scope your scan page, select the catalogs you want to scan.
Select a scan rule set for classification. You can choose between the system default, existing custom rule sets, or create a new rule set inline. Check the Classification article to learn more.
For Scan trigger, choose whether to set up a schedule or run the scan once.
Review your scan and select Save and Run.
View your scans and scan run to complete cataloging your data.
Once scanned, the data asset in Unity Catalog (UC) will be available on Microsoft Purview Unified Catalog search. For more details about how to connect and manage Azure Databricks Unity Catalog in Microsoft Purview, follow this document.
Важно
Select Access Token Authentication while creating a credential.
Place Access Token on your hosted Azure Key Vault and connect the key vault to the connection manager.
Make sure to provide product (service) MSI read (secret) access to the Key Vault.
Set up connection to databricks UC for data quality scan
At this point we have the scanned asset ready for cataloging and governance. Associate the scanned asset to the Data Product in a Governance Domain Sele. At the Data Quality Tab, add a new Azure SQL Database Connection: Get the Database Name entered manually.
Select Data quality > Governance Domain > Manage tab to create connection.
Configure connection in the connection page.
Add connection name and description
select source type Azure Databricks
select workspace URL
select Unity catalog as extraction method
select HTTP path
select unity catalog name
select schema name
select table name
select authentication method - Access Token
Add Azure subscription
Key vault connection
secret name
secret version
Test connection
Важно
Data Quality stewards need read only access to Azure databrics Unity Catalog to setup data quality connection.
vNet is not supported yet.
Profiling and Data Quality scanning for data in Azure Databricks Unity Catalog databases.
After completed connection setup successfully, you can profile, create and apply rules, and run DQ scan of your data in Azure Databricks Unity Catalog databases. Follow the step-by-step guideline described in below documents:
Questo modulo di training guiderà nella creazione di uno stack completo per la gestione dei dati master e la governance dei dati end-to-end con Microsoft Purview e CluedIn. Include lo sviluppo di record golden, la deduplicazione, la derivazione dei dati e strategie di qualità dei dati.
Amministrare un'infrastruttura di database SQL Server per database relazionali, ibridi, locali e cloud con le offerte di database relazionali Microsoft PaaS.
Informazioni su come connettersi ad Azure Databricks in Microsoft Purview e come usare Microsoft Purview per analizzare e gestire l'origine di Azure Databricks.
Informazioni su come applicare analisi della qualità dei dati agli asset all'interno di Archiviazione di Azure usando Microsoft Purview Unified Catalog.
Ottenere una panoramica delle regole di qualità dei dati in Microsoft Purview Unified Catalog e come usarle per aumentare la qualità e l'affidabilità dei dati.