Create a Unity Catalog metastore

This article shows how to create a metastore in Unity Catalog and link it to workspaces. A metastore is the top-level container of objects in Unity Catalog. It stores metadata about data assets (tables and views) and the permissions that govern access to them. You must create one metastore for each region in which your organization operates.

Note

In addition to the approaches described in this article, you can also create a metastore by using the Databricks Terraform provider, specifically the databricks_metastore resource. To enable Unity Catalog to access the metastore, use databricks_metastore_data_access. To link workspaces to a metastore, use databricks_metastore_assignment.

Requirements

To create a metastore:

  • You must be an Azure Databricks account admin.

    The first Azure Databricks account admin must be an Azure Active Directory Global Administrator at the time that they first log in to the Azure Databricks account console. Upon first login, that user becomes an Azure Databricks account admin and no longer needs the Azure Active Directory Global Administrator role to access the Azure Databricks account. The first account admin can assign users in the Azure Active Directory tenant as additional account admins (who can themselves assign more account admins). Additional account admins do not require specific roles in Azure Active Directory.

  • The workspaces that you attach to the metastore must be on the Azure Databricks Premium plan.

  • In your Azure tenant, you must have permission to create:

Create the metastore

To create a Unity Catalog metastore, you do the following:

  • Create a storage container where the metastore’s managed table data will be stored.

    This storage container must be in a Premium performance Azure Data Lake Storage Gen2 account in the same region as the workspaces you want to use to access the data.

    It can be overridden at the catalog and schema levels.

  • Create an identity that Azure Databricks uses to give access to that storage container.

    You can use either an Azure managed identity or a service principal as the identity that gives access to the metastore’s storage container.

    Unlike service principals, managed identities do not require you to maintain credentials or rotate secrets, and they let you connect to an Azure Data Lake Storage Gen2 account that is protected by a storage firewall.

  • Provide Azure Databricks with the storage container path and identity.

To create a Unity Catalog metastore that is accessed by an Azure managed identity:

  1. Create an Azure Databricks access connector and assign it permissions to the storage container where you want the metastore’s managed tables to be stored, using the instructions in Configure a managed identity for Unity Catalog.

    An Azure Databricks access connector is a first-party Azure resource that lets you connect a system-assigned managed identity to an Azure Databricks account.

    Make a note of the access connector’s resource ID.

    This storage container must be in a Premium performance Azure Data Lake Storage Gen2 account in the same region as your Azure Databricks workspaces.

  2. Log in to the Azure Databricks account console.

  3. Click Catalog icon Data.

  4. Click Create Metastore.

  5. Enter values for the following fields:

    • Name for the metastore.

    • Region where the metastore will be deployed.

      This must be the same region as the workspaces you want to use to access the data. Make sure that it matches the region of the access connector and storage container that you created earlier.

    • ADLS Gen 2 path: Enter the path to the storage container that you will use as the default root storage for managed table data.

      If a managed table is not in a schema or catalog that has its own managed storage location, it will be stored in this location.

      The abfss:// prefix is added automatically.

    • Access Connector ID: Enter the Azure Databricks access connector’s resource ID in the format:

      /subscriptions/12f34567-8ace-9c10-111c-aea8eba12345c/resourceGroups/<resource-group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>
      
  6. Click Create.

    If the request fails, retry using a different metastore name.

  7. When prompted, select workspaces to link to the metastore.

    For more information about linking workspaces to metastores, see Enable a workspace for Unity Catalog.

The user who creates a metastore is its original metastore admin. Databricks recommends that you reassign the metastore admin to a group. See (Recommended) Transfer ownership of your metastore to a group.

Create a metastore that is accessed using a service principal

To create a Unity Catalog metastore that is accessed by a service principal:

  1. Create a storage account for Azure Data Lake Storage Gen2.

    A storage container in this account will store all of the metastore’s managed tables, except those that are in a catalog or schema with their own managed storage location.

    See Create a storage account to use with Azure Data Lake Storage Gen2. This must be a Premium performance Azure Data Lake Storage Gen2 account in the same region as your Azure Databricks workspaces.

  2. Create a container in the new storage account.

    Make a note of the ADLSv2 URI for the container, which is in the following format:

    abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<metastore-name>
    

    In the steps that follow, replace <storage-container> with this URI.

  3. In Azure Active Directory, create a service principal.

    Unity Catalog will use this service principal to access containers in the storage account on behalf of Unity Catalog users. Generate a client secret for the service principal. See Azure service principal authentication. Make a note of the client secret for the service principal, the client application ID, and directory ID where you created this service principal. In the following steps, replace <client-secret>, <client-application-id>, and <directory-id> with these values.

  4. In the storage account, go to Access Control (IAM) and grant the new service principal the Storage blob data contributor role.

  5. Make a note of these properties, which you will use when you create a metastore:

    • <aad-application-id>
    • The storage account region
    • <storage-container>
    • The service principal’s <client-secret>, <client-application-id>, and <directory-id>
  6. Log in to the account console.

  7. Click Catalog icon Data.

  8. Click Create Metastore.

    1. Enter a name for the metastore.

    2. Enter the region where the metastore will be deployed.

      This must be the same region as the root storage account and the workspaces you want to use to access the data.

    3. For ADLS Gen 2 path, enter the value of <storage-container>. The abfss:// prefix is added automatically.

  9. Click Create.

    The user who creates a metastore is its owner. Databricks recommends that you reassign ownership of the metastore to a group. See (Recommended) Transfer ownership of your metastore to a group.

  10. Make a note of the metastore’s ID. When you view the metastore’s properties, the metastore’s ID is the portion of the URL after /data and before /configuration.

  11. The metastore has been created, but Unity Catalog cannot yet write data to it. To finish setting up the metastore:

    1. In a separate browser, log in to a workspace that is assigned to the metastore as a workspace admin.

    2. Make a note of the workspace URL, which is the first portion of the URL, after https:// and inclusive of azuredatabricks.net.

    3. Generate a personal access token. See the Token management API.

    4. Add the personal access token to the .netrc file in your home directory. This improves security by preventing the personal access token from appearing in your shell’s command history. See the Token management API.

    5. Run the following cURL command to create the root storage credential for the metastore. Replace the placeholder values:

      • <workspace-url>: The URL of the workspace where the personal access token was generated.
      • <credential-name>: A name for the storage credential.
      • <directory-id>: The directory ID for the service principal you created.
      • <application-id>: The application ID for the service principal you created.
      • <client-secret>: The value of the client secret you generated for the service principal (not the client secret ID).
      curl -n -X POST --header 'Content-Type: application/json' https://<workspace-url>/api/2.0/unity-catalog/storage-credentials --data "{
        \"name\": \"<credential-name>\",
        \"azure_service_principal\": {
          \"directory_id\": \"<directory-id>\",
          \"application_id\": \"<application-id>\",
          \"client_secret\": \"<client-secret>\"
        }
      }"
      

      Make a note of the storage credential ID, which is the value of id from the cURL command’s response.

  12. Run the following cURL command to update the metastore with the new root storage credential. Replace the placeholder values:

    • <workspace-url>: The URL of the workspace where the personal access token was generated.
    • <metastore-id>: The metastore’s ID.
    • <storage-credential-id>: The storage credential’s ID from the previous command.
    curl -n -X PATCH --header 'Content-Type: application/json' https://<workspace-url>/api/2.0/unity-catalog/metastores/<metastore-id> --data
    "{\"storage_root_credential_id\": \"<storage-credential-id>\"}"
    

You can now add workspaces to the metastore.

Enable Azure Databricks management for personal staging locations

Azure Databricks uses cross-origin resource sharing (CORS) to upload data to personal staging locations in Unity Catalog. See Configure Unity Catalog storage account for CORS.

Next steps

Delete a metastore

If you are closing your Azure Databricks account or have another reason to delete access to data managed by your Unity Catalog metastore, you can delete the metastore.

Warning

All objects managed by the metastore will become inaccessible using Azure Databricks workspaces. This action cannot be undone.

Managed table data and metadata will be auto-deleted after 30 days. External table data in your cloud storage is not affected by metastore deletion.

To delete a metastore:

  1. As a metastore admin, log in to the account console.
  2. Click Catalog icon Data.
  3. Click the metastore name.
  4. On the Configuration tab, click the three-button menu at the far upper right and select Delete.
  5. On the confirmation dialog, enter the name of the metastore and click Delete.