Use external Hive metastore database

Important

This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.

Hive metastore is used as a central repository for storing metadata about the data. This article describes how you can add a Hive metastore database to your Trino cluster with HDInsight on AKS. There are two ways:

  • You can add a Hive catalog and link it to an external Hive metastore database during Trino cluster creation.

  • You can add a Hive catalog and attach an external Hive metastore database to your cluster using ARM template update.

The following example covers the addition of Hive catalog and metastore database to your cluster using ARM template.

Prerequisites

Note

  • Currently, we support Azure SQL Database as in-built metastore.
  • Due to Hive limitation, "-" (hyphen) character in the metastore database name is not supported.
  • Only single metastore database connection is supported, all catalogs listed in clusterProfile.trinoProfile.catalogOptions.hive section will be configured to use one and the same database parameters which are specified first.

Add external Hive metastore database

There are few important sections you need to add to your cluster ARM template to configure the Hive catalog and Hive metastore database:

Metastore configuration

Configure external Hive metastore database in config.properties file:

{
    "fileName": "config.properties",
    "values": {
        "hive.metastore.hdi.metastoreDbConnectionURL": "jdbc:sqlserver://mysqlserver1.database.windows.net;database=myhmsdb1;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30",
        "hive.metastore.hdi.metastoreDbConnectionUserName": "trinoadmin",
        "hive.metastore.hdi.metastoreDbConnectionPasswordSecret": "hms-db-pwd",
        "hive.metastore.hdi.metastoreWarehouseDir": "abfs://container1@myadlsgen2account1.dfs.core.windows.net/hive/warehouse"
    }
}
Property Description Example
hive.metastore.hdi.metastoreDbConnectionURL JDBC connection string to database. jdbc:sqlserver://mysqlserver1.database.windows.net;database=myhmsdb1;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30
hive.metastore.hdi.metastoreDbConnectionUserName SQL user name to connect to database. trinoadmin
hive.metastore.hdi.metastoreDbConnectionPasswordSecret Secret referenceName configured in secretsProfile with password. hms-db-pwd
hive.metastore.hdi.metastoreWarehouseDir ABFS URI to location in storage where data is stored. abfs://container1@myadlsgen2account1.dfs.core.windows.net/hive/warehouse

Metastore authentication

Configure authentication to external Hive metastore database specifying Azure Key Vault secrets.

Note

referenceName should match value provided in hive.metastore.hdi.metastoreDbConnectionPasswordSecret

"secretsProfile": {
    "keyVaultResourceId": "/subscriptions/{USER_SUBSCRIPTION_ID}/resourceGroups/{USER_RESOURCE_GROUP}/providers/Microsoft.KeyVault/vaults/{USER_KEYVAULT_NAME}",
    "secrets": [
        {
            "referenceName": "hms-db-pwd",
            "type": "Secret",
            "keyVaultObjectName": "hms-db-pwd"
        }                        ]
},
Property Description Example
secretsProfile.keyVaultResourceId Azure resource ID string to Azure Key Vault where secrets for Hive metastore are stored. /subscriptions/0000000-0000-0000-0000-000000000000/resourceGroups/trino-rg/providers/Microsoft.KeyVault/vaults/trinoakv
secretsProfile.secrets[*].referenceName Unique reference name of the secret to use later in clusterProfile. Secret1_ref
secretsProfile.secrets[*].type Type of object in Azure Key Vault, only “Secret” is supported. Secret
secretsProfile.secrets[*].keyVaultObjectName Name of secret object in Azure Key Vault containing actual secret value. secret1

Catalog configuration

In order for a Trino catalog to use external Hive metastore it should specify hive.metastore=hdi property. For more information, see Add catalogs to existing cluster:

{
    "fileName": "hive1.properties",
    "values": {
        "connector.name": "hive",
        "hive.metastore": "hdi"
    }
}

Complete example

To configure external Hive metastore to an existing Trino cluster, add the required sections in your cluster ARM template by referring to the following example:

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {},
    "resources": [
        {
            "type": "microsoft.hdinsight/clusterpools/clusters",
            "apiVersion": "<api-version>",
            "name": "<cluster-pool-name>/<cluster-name>",
            "location": "<region, e.g. westeurope>",
            "tags": {},
            "properties": {
                "clusterType": "Trino",

                "clusterProfile": {
                    "secretsProfile": {
                        "keyVaultResourceId": "/subscriptions/{USER_SUBSCRIPTION_ID}/resourceGroups/{USER_RESOURCE_GROUP}/providers/Microsoft.KeyVault/vaults/{USER_KEYVAULT_NAME}",
                        "secrets": [
                            {
                                "referenceName": "hms-db-pwd",
                                "type": "Secret",
                                "keyVaultObjectName": "hms-db-pwd"
                            }                        ]
                    },
                    "serviceConfigsProfiles": [
                        {
                            "serviceName": "trino",
                            "configs": [
                                {
                                    "component": "common",
                                    "files": [
                                        {
                                            "fileName": "config.properties",
                                            "values": {
                                                "hive.metastore.hdi.metastoreDbConnectionURL": "jdbc:sqlserver://mysqlserver1.database.windows.net;database=myhmsdb1;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30",
                                                "hive.metastore.hdi.metastoreDbConnectionUserName": "trinoadmin",
                                                "hive.metastore.hdi.metastoreDbConnectionPasswordSecret": "hms-db-pwd",
                                                "hive.metastore.hdi.metastoreWarehouseDir": "abfs://container1@myadlsgen2account1.dfs.core.windows.net/hive/warehouse"
                                            }
                                        }
                                    ]
                                },
                                {
                                    "component": "catalogs",
                                    "files": [
                                        {
                                            "fileName": "hive1.properties",
                                            "values": {
                                                "connector.name": "hive",
                                                "hive.metastore": "hdi"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            }
        }
    ]
}

Deploy the updated ARM template to reflect the changes in your cluster. Learn how to deploy an ARM template. Once successfully deployed, you can see the "hive1" catalog in your Trino cluster.

You can run a few simple queries to try the Hive catalog.

Check if Hive catalog is created successfully.

show catalogs;

Query a table (In this example, "hive1" is the name of hive catalog specified).

create schema hive1.schema1;
create table hive1.schema1.tpchorders as select * from tpch.tiny.orders;
select * from hive1.schema1.tpchorders limit 100;

Alternative configuration

Alternatively external Hive metastore database parameters can be specified in trinoProfile.catalogOptions.hive together with hive.metastore=hdi catalog property:

Property Description Example
trinoProfile.catalogOptions.hive List of Hive or iceberg or delta catalogs with parameters of external Hive metastore database, require parameters for each. To use external metastore database, catalog must be present in this list.
trinoProfile.catalogOptions.hive[*].catalogName Name of Trino catalog configured in serviceConfigsProfiles, which configured to use external Hive metastore database. hive1
trinoProfile.catalogOptions.hive[*].metastoreDbConnectionURL JDBC connection string to database. jdbc:sqlserver://mysqlserver1.database.windows.net;database=myhmsdb1;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30
trinoProfile.catalogOptions.hive[*].metastoreDbConnectionUserName SQL user name to connect to database. trinoadmin
trinoProfile.catalogOptions.hive[*].metastoreDbConnectionPasswordSecret Secret referenceName configured in secretsProfile with password. hms-db-pwd
trinoProfile.catalogOptions.hive[*].metastoreWarehouseDir ABFS URI to location in storage where data is stored. abfs://container1@myadlsgen2account1.dfs.core.windows.net/hive/warehouse

Complete example

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {},
    "resources": [
        {
            "type": "microsoft.hdinsight/clusterpools/clusters",
            "apiVersion": "<api-version>",
            "name": "<cluster-pool-name>/<cluster-name>",
            "location": "<region, e.g. westeurope>",
            "tags": {},
            "properties": {
                "clusterType": "Trino",

                "clusterProfile": {
                    "secretsProfile": {
                        "keyVaultResourceId": "/subscriptions/{USER_SUBSCRIPTION_ID}/resourceGroups/{USER_RESOURCE_GROUP}/providers/Microsoft.KeyVault/vaults/{USER_KEYVAULT_NAME}",
                        "secrets": [
                            {
                                "referenceName": "hms-db-pwd",
                                "type": "Secret",
                                "keyVaultObjectName": "hms-db-pwd"
                            }                        ]
                    },
                    "serviceConfigsProfiles": [
                        {
                            "serviceName": "trino",
                            "configs": [
                                {
                                    "component": "catalogs",
                                    "files": [
                                        {
                                            "fileName": "hive1.properties",
                                            "values": {
                                                "connector.name": "hive",
                                                "hive.metastore": "hdi"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ],
                    "trinoProfile": {
                        "catalogOptions": {
                            "hive": [
                                {
                                    "catalogName": "hive1",
                                    "metastoreDbConnectionURL": "jdbc:sqlserver://mysqlserver1.database.windows.net;database=myhmsdb1;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30",
                                    "metastoreDbConnectionUserName": "trinoadmin",
                                    "metastoreDbConnectionPasswordSecret": "hms-db-pwd",
                                    "metastoreWarehouseDir": "abfs://container1@myadlsgen2account1.dfs.core.windows.net/hive/warehouse"
                                }
                            ]
                        }
                    }
                }
            }
        }
    ]
}