Connect to Amazon Redshift in Microsoft Purview

Important

This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include additional legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability.

This document introduces the preview for scanning Amazon Redshift in Microsoft Purview.

Supported capabilities

Metadata Extraction Full Scan Incremental Scan Scoped Scan Classification Labeling Access Policy Lineage Data Sharing Live view
Yes Yes No Yes No No No No No No

When scanning Amazon Redshift, Microsoft Purview supports extracting technical metadata including:

  • Server
  • Databases 
  • Schemas
  • Tables including the columns, foreign keys and unique constraints
  • Views including the columns 
  • Stored procedures including the parameter dataset
  • Functions including the parameter dataset

When setting up scan, you can choose to scan an entire Amazon Redshift database, or scope the scan to a subset of schemas matching the given name(s) or name pattern(s).

Known limitations

  • When an object is deleted from the data source, currently the subsequent scan won't automatically remove the corresponding asset in Microsoft Purview.

Prerequisites

Register

This section will enable you to register the Amazon Redshift data source for scan and data share in Purview.

Prerequisites for register

  • You'll need to be a Data Source Admin and one of the other Purview roles (for example, Data Reader or Data Share Contributor) to register a source and manage it in the Microsoft Purview governance portal. See our Microsoft Purview Permissions page for details.

Steps to register

It's important to register the data source in Microsoft Purview prior to setting up a scan for the data source.

  1. Go to the Microsoft Purview governance portal by:

  2. Navigate to the Data Map --> Sources

    Screenshot that shows the link to open Microsoft Purview governance portal

    Screenshot that navigates to the Sources link in the Data Map

  3. Navigate to the appropriate collection under the Sources menu and select the Register icon to register a new Amazon Redshift data source.

  4. Select the Amazon Redshift data source and select Continue.

  5. Provide a suitable Name for the data source, and provide these details:

    1. Endpoint - The endpoint of your Amazon Redshift cluster. For example: examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com
    2. Port - The port number that you specified when you launched the cluster. The default is 5439.

    Screenshot that shows the register menu for Amazon Redshift.

  6. The Amazon Redshift cluster is shown under the selected Collection

Scan

Tip

To troubleshoot any issues with scanning:

  1. Confirm you have properly set up authentication for scanning.
  2. Review our scan troubleshooting documentation.

Authentication for a scan

Microsoft Purview supports basic authentication (username and password) for scanning Amazon Redshift.

The user should have SELECT permission granted for every individual system table for Microsoft Purview to query metadata from:

  • svv_external_tables
  • svv_external_columns
  • svv_table_info
  • information_schema.routines
  • information_schema.parameters
  • pg_views
  • pg_database
  • pg_description

And user should have EXECUTE permission granted for system function for Microsoft Purview to query metadata from:

  • pg_get_late_binding_view_cols

Create a credential

Microsoft Purview uses Azure Key Vault to safely store the credentials it uses to authenticate with sources.

Create the scan

  1. Open your Microsoft Purview account and select the Open Microsoft Purview governance portal

  2. Navigate to the Data map --> Sources to view the collection hierarchy

  3. Select the New Scan icon under the Amazon Redshift data source registered earlier.

  4. Select your integration runtime.

  5. Provide these details:

    1. Name - a name for your scan
    2. Credential - choose the credential you created earlier
    3. Database - name of the database instance to scan
    4. Schema - The subset of schemas to import expressed as a semicolon separated list of schemas. For example: "schema1;schema2". All user schemas are imported if that list is empty. All system schemas and objects are ignored by default. Acceptable schema name patterns can be static names or contain wildcard %. For example: "A%;%B;%C%;D":
      • Start with A or
      • End with B or
      • Contain C or
      • Equal D

    Screenshot that shows the scan menu for Amazon Redshift.

  6. Select Test connection to validate the settings.

  7. Select Continue.

  8. For Scan trigger choose whether to set up a schedule or run the scan once.

  9. Review your scan and select Save and Run.

View your scans and scan runs

To view existing scans:

  1. Go to the Microsoft Purview portal. On the left pane, select Data map.
  2. Select the data source. You can view a list of existing scans on that data source under Recent scans, or you can view all scans on the Scans tab.
  3. Select the scan that has results you want to view. The pane shows you all the previous scan runs, along with the status and metrics for each scan run.
  4. Select the run ID to check the scan run details.

Manage your scans

To edit, cancel, or delete a scan:

  1. Go to the Microsoft Purview portal. On the left pane, select Data Map.

  2. Select the data source. You can view a list of existing scans on that data source under Recent scans, or you can view all scans on the Scans tab.

  3. Select the scan that you want to manage. You can then:

    • Edit the scan by selecting Edit scan.
    • Cancel an in-progress scan by selecting Cancel scan run.
    • Delete your scan by selecting Delete scan.

Note

  • Deleting your scan does not delete catalog assets created from previous scans.

Browse, search, and view assets

After successfully scanning your Amazon Redshift clusters, you can browse the data catalog or search data catalog to view the asset details.

Next steps

Follow the below guides to learn more about Microsoft Purview and your data.