Connect to Azure Cosmos DB for SQL API in Microsoft Purview
This article outlines the process to register and scan Azure Cosmos DB for SQL API instance in Microsoft Purview, including instructions to authenticate and interact with the Azure Cosmos DB database source
Supported capabilities
Metadata Extraction | Full Scan | Incremental Scan | Scoped Scan | Classification | Labeling | Access Policy | Lineage | Data Sharing | Live view |
---|---|---|---|---|---|---|---|---|---|
Yes | Yes | No | Yes | Yes | Yes | No | No** | No | No |
** Lineage is supported if dataset is used as a source/sink in Data Factory Copy activity
Prerequisites
An Azure account with an active subscription. Create an account for free.
An active Microsoft Purview account.
You will need to be a Data Source Administrator and Data Reader to register a source and manage it in the Microsoft Purview governance portal. See our Microsoft Purview Permissions page for details.
Register
This section will enable you to register the Azure Cosmos DB for SQL API instance and set up an appropriate authentication mechanism to ensure successful scanning of the data source.
Steps to register
It is important to register the data source in Microsoft Purview prior to setting up a scan for the data source.
Open the Microsoft Purview governance portal by:
- Browsing directly to https://web.purview.azure.com and selecting your Microsoft Purview account.
- Opening the Azure portal, searching for and selecting the Microsoft Purview account. Select the the Microsoft Purview governance portal button.
Navigate to the Data Map --> Collections
Create the Collection hierarchy using the Collections menu and assign permissions to individual subcollections, as required
Navigate to the appropriate collection under the Sources menu and select the Register icon to register a new Azure Cosmos DB database
Select the Azure Cosmos DB for SQL API data source and select Continue
Provide a suitable Name for the data source, select the relevant Azure subscription, Cosmos DB account name and the collection and select Apply
The Azure Cosmos DB database storage account will be shown under the selected Collection
Scan
Authentication for a scan
In order to have access to scan the data source, an authentication method in the Azure Cosmos DB database Storage account needs to be configured.
There is only one way to set up authentication for Azure Cosmos DB Database:
Account Key - Secrets can be created inside an Azure Key Vault to store credentials in order to enable access for Microsoft Purview to scan data sources securely using the secrets. A secret can be a storage account key, SQL login password or a password.
Note
You need to deploy an Azure key vault resource in your subscription and assign Microsoft Purview account’s MSI with required access permission to secrets inside Azure key vault.
Using Account Key for scanning
You need to get your access key and store in the key vault:
Navigate to your Azure Cosmos DB database storage account
Select Settings > Keys
Copy your key and save it separately for the next steps
Navigate to your key vault
Select Settings > Secrets and select + Generate/Import
Enter the Name and Value as the key from your storage account and Select Create to complete
If your key vault is not connected to Microsoft Purview yet, you will need to create a new key vault connection
Finally, create a new credential using the key to set up your scan.
Creating the scan
Open your Microsoft Purview account and select the Open Microsoft Purview governance portal
Navigate to the Data map --> Sources to view the collection hierarchy
Select the New Scan icon under the Azure Cosmos database registered earlier
Provide a Name for the scan.
Choose either the Azure integration runtime if your source is publicly accessible, a managed virtual network integration runtime if using a managed virtual network, or a self-hosted integration runtime if your source is in a private virtual network. For more information about which integration runtime to use, see the choose the right integration runtime configuration article.
Choose the appropriate collection for the scan and select + New under Credential
Select the appropriate Key vault connection and the Secret name that was used while creating the Account Key. Choose Authentication method as Account Key
Select Test connection. On a successful connection, select Continue
Scoping and running the scan
You can scope your scan to specific folders and subfolders by choosing the appropriate items in the list.
Then select a scan rule set. You can choose between the system default, existing custom rule sets, or create a new rule set inline.
You can select the classification rules to be included in the scan rule
Choose your scan trigger. You can set up a schedule or run the scan once.
Review your scan and select Save and run.
Viewing Scan
Navigate to the data source in the Collection and select View Details to check the status of the scan
The scan details indicate the progress of the scan in the Last run status and the number of assets scanned and classified
The Last run status will be updated to In progress and then Completed once the entire scan has run successfully
Managing Scan
Scans can be managed or run again on completion.
Select the Scan name to manage the scan
You can run the scan again, edit the scan, delete the scan
You can run a Full Scan again
Next steps
Now that you have registered your source, follow the below guides to learn more about Microsoft Purview and your data.