Connect to Amazon Redshift in Microsoft Purview
Important
This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include additional legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability.
This document introduces the preview for scanning Amazon Redshift in Microsoft Purview.
Supported capabilities
Metadata Extraction | Full Scan | Incremental Scan | Scoped Scan | Classification | Labeling | Access Policy | Lineage | Data Sharing | Live view |
---|---|---|---|---|---|---|---|---|---|
Yes | Yes | No | Yes | No | No | No | No | No | No |
When scanning Amazon Redshift, Microsoft Purview supports extracting technical metadata including:
- Server
- Databases
- Schemas
- Tables including the columns, foreign keys and unique constraints
- Views including the columns
- Stored procedures including the parameter dataset
- Functions including the parameter dataset
When setting up scan, you can choose to scan an entire Amazon Redshift database, or scope the scan to a subset of schemas matching the given name(s) or name pattern(s).
Known limitations
- When an object is deleted from the data source, currently the subsequent scan won't automatically remove the corresponding asset in Microsoft Purview.
Prerequisites
An Azure account with an active subscription. Create an account for free.
An active Microsoft Purview account.
You'll need to be a Data Source Administrator and Data Reader to register a source and manage it in the Microsoft Purview governance portal. See our Microsoft Purview Permissions page for details.
Set up the right integration runtime for your scenario. If your data source isn't publicly accessible, set up the latest kubernetes supported self-hosted integration runtime.
Register
This section will enable you to register the Amazon Redshift data source for scan and data share in Purview.
Prerequisites for register
- You'll need to be a Data Source Admin and one of the other Purview roles (for example, Data Reader or Data Share Contributor) to register a source and manage it in the Microsoft Purview governance portal. See our Microsoft Purview Permissions page for details.
Steps to register
It's important to register the data source in Microsoft Purview prior to setting up a scan for the data source.
Go to the Microsoft Purview governance portal by:
- Browsing directly to https://web.purview.azure.com and selecting your Microsoft Purview account.
- Opening the Azure portal, searching for and selecting the Microsoft Purview account. Select the the Microsoft Purview governance portal button.
Navigate to the Data Map --> Sources
Navigate to the appropriate collection under the Sources menu and select the Register icon to register a new Amazon Redshift data source.
Select the Amazon Redshift data source and select Continue.
Provide a suitable Name for the data source, and provide these details:
- Endpoint - The endpoint of your Amazon Redshift cluster. For example:
examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com
- Port - The port number that you specified when you launched the cluster. The default is 5439.
- Endpoint - The endpoint of your Amazon Redshift cluster. For example:
The Amazon Redshift cluster is shown under the selected Collection
Scan
Tip
To troubleshoot any issues with scanning:
- Confirm you have properly set up authentication for scanning.
- Review our scan troubleshooting documentation.
Authentication for a scan
Microsoft Purview supports basic authentication (username and password) for scanning Amazon Redshift.
The user should have SELECT permission granted for every individual system table for Microsoft Purview to query metadata from:
- svv_external_tables
- svv_external_columns
- svv_table_info
- information_schema.routines
- information_schema.parameters
- pg_views
- pg_database
- pg_description
And user should have EXECUTE permission granted for system function for Microsoft Purview to query metadata from:
- pg_get_late_binding_view_cols
Create a credential
Microsoft Purview uses Azure Key Vault to safely store the credentials it uses to authenticate with sources.
- If you haven't set up an Azure Key Vault to store credentials, first create an Azure Key Vault and then follow these steps to connect your Microsoft Purview account to your Azure Key Vault.
- Once you have an Azure Key Vault, store your Amazon Redshift password as a secret in the Key Vault, and then follow these steps to create a credential with this information:
- Basic authentication credential
- Add your user name in the input field
- Add your Key Vault connection and the name of the secret where your password is stored
Create the scan
Open your Microsoft Purview account and select the Open Microsoft Purview governance portal
Navigate to the Data map --> Sources to view the collection hierarchy
Select the New Scan icon under the Amazon Redshift data source registered earlier.
Select your integration runtime.
Provide these details:
- Name - a name for your scan
- Credential - choose the credential you created earlier
- Database - name of the database instance to scan
- Schema - The subset of schemas to import expressed as a semicolon separated list of schemas. For example: "schema1;schema2". All user schemas are imported if that list is empty. All system schemas and objects are ignored by default. Acceptable schema name patterns can be static names or contain wildcard %. For example: "A%;%B;%C%;D":
- Start with A or
- End with B or
- Contain C or
- Equal D
Select Test connection to validate the settings.
Select Continue.
For Scan trigger choose whether to set up a schedule or run the scan once.
Review your scan and select Save and Run.
View your scans and scan runs
To view existing scans:
- Go to the Microsoft Purview portal. On the left pane, select Data map.
- Select the data source. You can view a list of existing scans on that data source under Recent scans, or you can view all scans on the Scans tab.
- Select the scan that has results you want to view. The pane shows you all the previous scan runs, along with the status and metrics for each scan run.
- Select the run ID to check the scan run details.
Manage your scans
To edit, cancel, or delete a scan:
Go to the Microsoft Purview portal. On the left pane, select Data Map.
Select the data source. You can view a list of existing scans on that data source under Recent scans, or you can view all scans on the Scans tab.
Select the scan that you want to manage. You can then:
- Edit the scan by selecting Edit scan.
- Cancel an in-progress scan by selecting Cancel scan run.
- Delete your scan by selecting Delete scan.
Note
- Deleting your scan does not delete catalog assets created from previous scans.
Browse, search, and view assets
After successfully scanning your Amazon Redshift clusters, you can browse the data catalog or search data catalog to view the asset details.
Next steps
Follow the below guides to learn more about Microsoft Purview and your data.
- Browse or Search the data catalog.
- Data Estate Insights in Microsoft Purview