Create a Unity Catalog metastore

This article shows how to create a metastore in Unity Catalog and link it to workspaces.

Note

In addition to the approaches described in this article, you can also create a metastore by using the Databricks Terraform provider, specifically the databricks_metastore resource. To enable Unity Catalog to access the metastore, use databricks_metastore_data_access. To link workspaces to a metastore, use databricks_metastore_assignment.

Requirements

  • You must be an Azure Databricks account admin.
  • Your Azure Databricks account must be on the Premium plan.
  • In your Azure tenant, you must have permission to create:

Create the metastore

To create a Unity Catalog metastore, you do the following:

  • Create a storage container where the metastore’s metadata and managed tables will be stored.

    This storage container must be in the same region as the workspaces you want to use to access the data.

    It can be overridden at the catalog and schema levels.

  • Create an identity that Azure Databricks uses to give access to that storage container.

    You can use either an Azure managed identity or a service principal as the identity that gives access to the metastore’s storage container.

    Unlike service principals, managed identities do not require you to maintain credentials or rotate secrets, and they let you connect to an Azure Data Lake Storage Gen2 account that is protected by a storage firewall.

  • Provide Azure Databricks with the storage container path and identity.

To create a Unity Catalog metastore that is accessed by an Azure managed identity:

  1. Create an Azure Databricks access connector and assign it permissions to the storage container where you want the metastore’s metadata and managed tables to be stored, using the instructions in Configure a managed identity for Unity Catalog.

    An Azure Databricks access connector is a first-party Azure resource that lets you connect a system-assigned managed identity to an Azure Databricks account.

    Make note of the access connector’s resource ID.

  2. Log in to the Azure Databricks account console.

  3. Click Data Icon Data.

  4. Click Create Metastore.

  5. Enter values for the following fields:

    • Name for the metastore.

    • Region where the metastore will be deployed.

      This must be the same region as the workspaces you want to use to access the data. Make sure that it matches the region of the access connector and storage container that you created earlier.

    • ADLS Gen 2 path: Enter the path to the storage container that you will use as root storage for the metastore.

      The abfss:// prefix is added automatically.

    • Access Connector ID: Enter the Azure Databricks access connector’s resource ID in the format:

      /subscriptions/12f34567-8ace-9c10-111c-aea8eba12345c/resourceGroups/<resource_group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>
      
  6. Click Create.

    If the request fails, retry using a different metastore name.

  7. When prompted, select workspaces to link to the metastore.

    For more information about linking workspaces to metastores, see Enable a workspace for Unity Catalog.

The user who creates a metastore is its original metastore admin. Databricks recommends that you reassign the metastore admin to a group. See (Recommended) Transfer ownership of your metastore to a group.

Create a metastore that is accessed using a service principal

To create a Unity Catalog metastore that is accessed by a service principal:

  1. Create a storage account for Azure Data Lake Storage Gen2.

    This storage account will contain metadata related to Unity Catalog metastores and their objects, as well as the data for managed tables in Unity Catalog. See Create a storage account to use with Azure Data Lake Storage Gen2. Make a note of the region where you created the storage account.

  2. Create a container in the new storage account.

    Make a note of the ADLSv2 URI for the container, which is in the following format:

    abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<metastore-name>
    

    In the steps that follow, replace <storage-container> with this URI.

  3. In Azure Active Directory, create a service principal.

    Unity Catalog will use this service principal to access containers in the storage account on behalf of Unity Catalog users. Generate a client secret for the service principal. See Provision a service principal in Azure portal. Make a note of the client secret for the service principal, the client application ID, and directory ID where you created this service principal. In the following steps, replace <client-secret>, <client-application-id>, and <directory-id> with these values.

  4. In the storage account, go to Access Control (IAM) and grant the new service principal the Storage blob data contributor role.

  5. Make note of these properties, which you will use when you create a metastore:

    • <aad-application-id>
    • The storage account region
    • <storage-container>
    • The service principal’s <client-secret>, <client-application-id>, and <directory-id>
  6. Log in to the account console.

  7. Click Data Icon Data.

  8. Click Create Metastore.

    1. Enter a name for the metastore.

    2. Enter the region where the metastore will be deployed.

      This must be the same region as the root storage account and the workspaces you want to use to access the data.

    3. For ADLS Gen 2 path, enter the value of <storage-container>. The abfss:// prefix is added automatically.

  9. Click Create.

    The user who creates a metastore is its owner. Databricks recommends that you reassign ownership of the metastore to a group. See (Recommended) Transfer ownership of your metastore to a group.

  10. Make a note of the metastore’s ID. When you view the metastore’s properties, the metastore’s ID is the portion of the URL after /data and before /configuration.

  11. The metastore has been created, but Unity Catalog cannot yet write data to it. To finish setting up the metastore:

    1. In a separate browser, log in to a workspace that is assigned to the metastore as a workspace admin.

    2. Make a note of the workspace URL, which is the first portion of the URL, after https:// and inclusive of azuredatabricks.net.

    3. Generate a personal access token. See Generate a personal access token.

    4. Add the personal access token to the .netrc file in your home directory. This improves security by preventing the personal access token from appearing in your shell’s command history. See Store tokens in a .netrc file and use them in curl.

    5. Run the following cURL command to create the root storage credential for the metastore. Replace the placeholder values:

      • <workspace-url>: The URL of the workspace where the personal access token was generated.
      • <credential-name>: A name for the storage credential.
      • <directory-id>: The directory ID for the service principal you created.
      • <application-id>: The application ID for the service principal you created.
      • <client-secret>: The value of the client secret you generated for the service principal (not the client secret ID).
      curl -n -X POST --header 'Content-Type: application/json' https://<workspace-url>/api/2.0/unity-catalog/storage-credentials --data "{
        \"name\": \"<credential-name>\",
        \"azure_service_principal\": {
          \"directory_id\": \"<directory-id>\",
          \"application_id\": \"<application-id>\",
          \"client_secret\": \"<client-secret>\"
        }
      }"
      

      Make a note of the storage credential ID, which is the value of id from the cURL command’s response.

  12. Run the following cURL command to update the metastore with the new root storage credential. Replace the placeholder values:

    • <workspace-url>: The URL of the workspace where the personal access token was generated.
    • <metastore-id>: The metastore’s ID.
    • <storage-credential-id>: The storage credential’s ID from the previous command.
    curl -n -X PATCH --header 'Content-Type: application/json' https://<workspace-url>/api/2.0/unity-catalog/metastores/<metastore-id> --data
    "{\"storage_root_credential_id\": \"<storage-credential-id>\"}"
    

You can now add workspaces to the metastore.

Enable Azure Databricks management for personal staging locations

Azure Databricks uses cross-origin resource sharing (CORS) to upload data to personal staging locations in Unity Catalog. See Configure Unity Catalog storage account for CORS.

Next steps

Delete a metastore

If you are closing your Azure Databricks account or have another reason to delete access to data managed by your Unity Catalog metastore, you can delete the metastore.

Warning

All objects managed by the metastore will become inaccessible using Azure Databricks workspaces. This action cannot be undone.

Managed table data and metadata will be auto-deleted after 30 days. External table data in your cloud storage is not affected by metastore deletion.

To delete a metastore:

  1. As a metastore admin, log in to the account console.
  2. Click Data Icon Data.
  3. Click the metastore name.
  4. On the Configuration tab, click the three-button menu at the far upper right and select Delete.
  5. On the confirmation dialog, enter the name of the metastore and click Delete.