Use external Hive metastore database
Note
We will retire Azure HDInsight on AKS on January 31, 2025. Before January 31, 2025, you will need to migrate your workloads to Microsoft Fabric or an equivalent Azure product to avoid abrupt termination of your workloads. The remaining clusters on your subscription will be stopped and removed from the host.
Only basic support will be available until the retirement date.
Important
This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.
Hive metastore is used as a central repository for storing metadata about the data. This article describes how you can add a Hive metastore database to your Trino cluster with HDInsight on AKS. There are two ways:
You can add a Hive catalog and link it to an external Hive metastore database during Trino cluster creation.
You can add a Hive catalog and attach an external Hive metastore database to your cluster using ARM template update.
The following example covers the addition of Hive catalog and metastore database to your cluster using ARM template.
Prerequisites
- An operational Trino cluster with HDInsight on AKS.
- Create ARM template for your cluster.
- Review complete cluster ARM template sample.
- Familiarity with ARM template authoring and deployment.
Note
- Currently, we support Azure SQL Database as in-built metastore.
- Due to Hive limitation, "-" (hyphen) character in the metastore database name is not supported.
- Only single metastore database connection is supported, all catalogs listed in
clusterProfile.trinoProfile.catalogOptions.hive
section will be configured to use one and the same database parameters which are specified first.
Add external Hive metastore database
There are few important sections you need to add to your cluster ARM template to configure the Hive catalog and Hive metastore database:
Metastore configuration
Configure external Hive metastore database in config.properties
file:
{
"fileName": "config.properties",
"values": {
"hive.metastore.hdi.metastoreDbConnectionURL": "jdbc:sqlserver://server-name.database.windows.net;database=myhmsdb1;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30",
"hive.metastore.hdi.metastoreDbConnectionUserName": "trinoadmin",
"hive.metastore.hdi.metastoreDbConnectionPasswordSecret": "hms-db-pwd",
"hive.metastore.hdi.metastoreWarehouseDir": "abfs://container1@myadlsgen2account1.dfs.core.windows.net/hive/warehouse"
}
}
Property | Description | Example |
---|---|---|
hive.metastore.hdi.metastoreDbConnectionURL | JDBC connection string to database. | jdbc:sqlserver://server-name.database.windows.net;database=myhmsdb1;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30 |
hive.metastore.hdi.metastoreDbConnectionUserName | SQL user name to connect to database. | trinoadmin |
hive.metastore.hdi.metastoreDbConnectionPasswordSecret | Secret referenceName configured in secretsProfile with password. | hms-db-pwd |
hive.metastore.hdi.metastoreWarehouseDir | ABFS URI to location in storage where data is stored. | abfs://container1@myadlsgen2account1.dfs.core.windows.net/hive/warehouse |
Metastore authentication
Configure authentication to external Hive metastore database specifying Azure Key Vault secrets.
Note
referenceName
should match value provided in hive.metastore.hdi.metastoreDbConnectionPasswordSecret
"secretsProfile": {
"keyVaultResourceId": "/subscriptions/{USER_SUBSCRIPTION_ID}/resourceGroups/{USER_RESOURCE_GROUP}/providers/Microsoft.KeyVault/vaults/{USER_KEYVAULT_NAME}",
"secrets": [
{
"referenceName": "hms-db-pwd",
"type": "Secret",
"keyVaultObjectName": "hms-db-pwd"
} ]
},
Property | Description | Example |
---|---|---|
secretsProfile.keyVaultResourceId | Azure resource ID string to Azure Key Vault where secrets for Hive metastore are stored. | /subscriptions/0000000-0000-0000-0000-000000000000/resourceGroups/trino-rg/providers/Microsoft.KeyVault/vaults/trinoakv |
secretsProfile.secrets[*].referenceName | Unique reference name of the secret to use later in clusterProfile. | Secret1_ref |
secretsProfile.secrets[*].type | Type of object in Azure Key Vault, only “Secret” is supported. | Secret |
secretsProfile.secrets[*].keyVaultObjectName | Name of secret object in Azure Key Vault containing actual secret value. | secret1 |
Catalog configuration
In order for a Trino catalog to use external Hive metastore it should specify hive.metastore=hdi
property. For more information, see Add catalogs to existing cluster:
{
"fileName": "hive1.properties",
"values": {
"connector.name": "hive",
"hive.metastore": "hdi"
}
}
Complete example
To configure external Hive metastore to an existing Trino cluster, add the required sections in your cluster ARM template by referring to the following example:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {},
"resources": [
{
"type": "microsoft.hdinsight/clusterpools/clusters",
"apiVersion": "<api-version>",
"name": "<cluster-pool-name>/<cluster-name>",
"location": "<region, e.g. westeurope>",
"tags": {},
"properties": {
"clusterType": "Trino",
"clusterProfile": {
"secretsProfile": {
"keyVaultResourceId": "/subscriptions/{USER_SUBSCRIPTION_ID}/resourceGroups/{USER_RESOURCE_GROUP}/providers/Microsoft.KeyVault/vaults/{USER_KEYVAULT_NAME}",
"secrets": [
{
"referenceName": "hms-db-pwd",
"type": "Secret",
"keyVaultObjectName": "hms-db-pwd"
} ]
},
"serviceConfigsProfiles": [
{
"serviceName": "trino",
"configs": [
{
"component": "common",
"files": [
{
"fileName": "config.properties",
"values": {
"hive.metastore.hdi.metastoreDbConnectionURL": "jdbc:sqlserver://server-name.database.windows.net;database=myhmsdb1;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30",
"hive.metastore.hdi.metastoreDbConnectionUserName": "trinoadmin",
"hive.metastore.hdi.metastoreDbConnectionPasswordSecret": "hms-db-pwd",
"hive.metastore.hdi.metastoreWarehouseDir": "abfs://container1@myadlsgen2account1.dfs.core.windows.net/hive/warehouse"
}
}
]
},
{
"component": "catalogs",
"files": [
{
"fileName": "hive1.properties",
"values": {
"connector.name": "hive",
"hive.metastore": "hdi"
}
}
]
}
]
}
]
}
}
}
]
}
Deploy the updated ARM template to reflect the changes in your cluster. Learn how to deploy an ARM template. Once successfully deployed, you can see the "hive1" catalog in your Trino cluster.
You can run a few simple queries to try the Hive catalog.
Check if Hive catalog is created successfully.
show catalogs;
Query a table (In this example, "hive1" is the name of hive catalog specified).
create schema hive1.schema1;
create table hive1.schema1.tpchorders as select * from tpch.tiny.orders;
select * from hive1.schema1.tpchorders limit 100;
Alternative configuration
Alternatively external Hive metastore database parameters can be specified in trinoProfile.catalogOptions.hive
together with hive.metastore=hdi
catalog property:
Property | Description | Example |
---|---|---|
trinoProfile.catalogOptions.hive | List of Hive or iceberg or delta catalogs with parameters of external Hive metastore database, require parameters for each. To use external metastore database, catalog must be present in this list. | |
trinoProfile.catalogOptions.hive[*].catalogName | Name of Trino catalog configured in serviceConfigsProfiles , which configured to use external Hive metastore database. |
hive1 |
trinoProfile.catalogOptions.hive[*].metastoreDbConnectionURL | JDBC connection string to database. | jdbc:sqlserver://server-name.database.windows.net;database=myhmsdb1;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30 |
trinoProfile.catalogOptions.hive[*].metastoreDbConnectionUserName | SQL user name to connect to database. | trinoadmin |
trinoProfile.catalogOptions.hive[*].metastoreDbConnectionPasswordSecret | Secret referenceName configured in secretsProfile with password. | hms-db-pwd |
trinoProfile.catalogOptions.hive[*].metastoreWarehouseDir | ABFS URI to location in storage where data is stored. | abfs://container1@myadlsgen2account1.dfs.core.windows.net/hive/warehouse |
Complete example
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {},
"resources": [
{
"type": "microsoft.hdinsight/clusterpools/clusters",
"apiVersion": "<api-version>",
"name": "<cluster-pool-name>/<cluster-name>",
"location": "<region, e.g. westeurope>",
"tags": {},
"properties": {
"clusterType": "Trino",
"clusterProfile": {
"secretsProfile": {
"keyVaultResourceId": "/subscriptions/{USER_SUBSCRIPTION_ID}/resourceGroups/{USER_RESOURCE_GROUP}/providers/Microsoft.KeyVault/vaults/{USER_KEYVAULT_NAME}",
"secrets": [
{
"referenceName": "hms-db-pwd",
"type": "Secret",
"keyVaultObjectName": "hms-db-pwd"
} ]
},
"serviceConfigsProfiles": [
{
"serviceName": "trino",
"configs": [
{
"component": "catalogs",
"files": [
{
"fileName": "hive1.properties",
"values": {
"connector.name": "hive",
"hive.metastore": "hdi"
}
}
]
}
]
}
],
"trinoProfile": {
"catalogOptions": {
"hive": [
{
"catalogName": "hive1",
"metastoreDbConnectionURL": "jdbc:sqlserver://server-name.database.windows.net;database=myhmsdb1;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30",
"metastoreDbConnectionUserName": "trinoadmin",
"metastoreDbConnectionPasswordSecret": "hms-db-pwd",
"metastoreWarehouseDir": "abfs://container1@myadlsgen2account1.dfs.core.windows.net/hive/warehouse"
}
]
}
}
}
}
}
]
}