Configure Delta Lake catalog

Important

This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.

This article provides an overview of how to configure Delta Lake catalog in your Trino cluster with HDInsight on AKS. You can add a new catalog by updating your cluster ARM template except the hive catalog, which you can add during Trino cluster creation in the Azure portal.

Prerequisites

Steps to configure Delta Lake catalog

  1. Update your cluster ARM template to add a new Delta Lake catalog config file. This configuration needs to be defined in serviceConfigsProfiles under clusterProfile property of the ARM template.

    Property Value Description
    fileName delta.properties Name of the catalog file. If the file is called delta.properties, delta becomes the catalog name.
    connector.name delta-lake The type of the catalog. For Delta Lake, catalog type must be delta-lake
    delta.register-table-procedure.enabled true Required to allow external tables to be registered.

    See Trino documentation for other delta lake configuration options.

    "serviceConfigsProfiles": [
    {
        "serviceName": "trino",
        "configs": [
            {
                "component": "catalogs",
                "files": [
                            {
                                "fileName": "delta.properties",
                                "values": {
                                    "connector.name": "delta-lake",
                                    "delta.register-table-procedure.enabled": "true"
                                }
                            }
       ]
    
    ...
    
  2. Configure a Hive metastore for table definitions and locations if you don't have a metastore already configured.

    • Configure the Hive metastore for the Delta catalog.

      The catalogOptions section of the ARM template defines the Hive metastore connection details and it can set up

      • Metastore config.
      • Metastore instance.
      • Link from the catalog to the metastore (catalogName).

      Add this catalogOptions configuration under trinoProfile property to your cluster ARM template:

      Note

      If Hive catalog options are already present, duplicate your Hive config and specify the delta catalog name.

      "trinoProfile": {
         "catalogOptions": {
             "hive": [
                 {
                     "catalogName": "delta",
                     "metastoreDbConnectionURL": "jdbc:sqlserver://{{DATABASE_SERVER}}.database.windows.net:1433;database={DATABASE_NAME}};encrypt=true;trustServerCertificate=true;loginTimeout=30;",
                     "metastoreDbConnectionUserName": "{{DATABASE_USER_NAME}}",
                     "metastoreDbConnectionPasswordSecret": "hms-db-pwd-ref",
                     "metastoreWarehouseDir": "abfss://{{AZURE_STORAGE_CONTAINER}}@{{AZURE_STORAGE_ACCOUNT_NAME}}.dfs.core.windows.net/"
                 }  
             ]
         }
      } ...
      
  3. Assign the Storage Blob Data Owner role to your cluster user-assigned MSI in the storage account containing the delta tables. Learn how to assign a role.

    • User-assigned MSI name is listed in the msiResourceId property in the cluster's resource JSON.

Deploy the updated ARM template to reflect the changes in your cluster. Learn how to deploy an ARM template.
Once successfully deployed, you can see the "delta" catalog in your Trino cluster.

Next steps

Read Delta Lakes tables (Synapse or External Location)