Query data from AWS S3 using AWS Glue
Note
We will retire Azure HDInsight on AKS on January 31, 2025. Before January 31, 2025, you will need to migrate your workloads to Microsoft Fabric or an equivalent Azure product to avoid abrupt termination of your workloads. The remaining clusters on your subscription will be stopped and removed from the host.
Only basic support will be available until the retirement date.
Important
This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.
This article provides examples of how you can add catalogs to a Trino cluster with HDInsight on AKS where catalogs are using AWS Glue as metastore and AWS S3 as storage.
Prerequisites
- Understanding of Trino cluster configurations for HDInsight on AKS.
- How to add catalogs to an existing cluster.
- AWS account with Glue and S3.
Trino catalogs with AWS S3 and AWS Glue as metastore
Several Trino connectors support AWS Glue. More details on catalogs Glue configuration properties can be found in Trino documentation.
Refer to Quickstart with AWS Glue and S3 for setting up AWS resources.
Note
Securely store Glue and S3 access keys in Azure Key Vault, and configure secretsProfile to use secrets in catalogs instead of specifying them in open text in ARM template.
Add Hive catalog
You can add the following sample JSON in your Trino cluster under clusterProfile
section in the ARM template.
Update the values as per your requirement.
"serviceConfigsProfiles": [
{
"serviceName": "trino",
"configs": [
{
"component": "catalogs",
"files": [
{
"fileName": "hiveglue.properties",
"values": {
"connector.name": "hive",
"hive.metastore": "glue",
"hive.metastore.glue.region": "us-west-2",
"hive.metastore.glue.endpoint-url": "glue.us-west-2.amazonaws.com",
"hive.metastore.glue.aws-access-key": "${SECRET_REF:aws-user-access-key-ref}",
"hive.metastore.glue.aws-secret-key": "{SECRET_REF:aws-user-access-secret-ref}",
"hive.metastore.glue.catalogid": "<AWS account ID>",
"hive.s3.aws-access-key": "{SECRET_REF:aws-user-access-key-ref}",
"hive.s3.aws-secret-key": "{SECRET_REF:aws-user-access-secret-ref}"
"hive.temporary-staging-directory-enabled": "false"
}
}
]
}
]
}
]
Add Delta Lake catalog
You can add the following sample JSON in your Trino cluster under clusterProfile
section in the ARM template.
Update the values as per your requirement.
"serviceConfigsProfiles": [
{
"serviceName": "trino",
"configs": [
{
"component": "catalogs",
"files": [
{
"fileName": "deltaglue.properties",
"values": {
"connector.name": "delta_lake",
"hive.metastore": "glue",
"hive.metastore.glue.region": "us-west-2",
"hive.metastore.glue.endpoint-url": "glue.us-west-2.amazonaws.com",
"hive.metastore.glue.aws-access-key": "${SECRET_REF:aws-user-access-key-ref}",
"hive.metastore.glue.aws-secret-key": "{SECRET_REF:aws-user-access-secret-ref}",
"hive.metastore.glue.catalogid": "<AWS account ID>",
"hive.s3.aws-access-key": "{SECRET_REF:aws-user-access-key-ref}",
"hive.s3.aws-secret-key": "{SECRET_REF:aws-user-access-secret-ref}"
}
}
]
}
]
}
]
Add Iceberg catalog
You can add the following sample JSON in your Trino cluster under clusterProfile
section in the ARM template.
Update the values as per your requirement.
"serviceConfigsProfiles": [
{
"serviceName": "trino",
"configs": [
{
"component": "catalogs",
"files": [
{
"fileName": "iceglue.properties",
"values": {
"connector.name": "iceberg",
"iceberg.catalog.type": "glue",
"hive.metastore.glue.region": "us-west-2",
"hive.metastore.glue.endpoint-url": "glue.us-west-2.amazonaws.com",
"hive.metastore.glue.aws-access-key": "${SECRET_REF:aws-user-access-key-ref}",
"hive.metastore.glue.aws-secret-key": "{SECRET_REF:aws-user-access-secret-ref}",
"hive.metastore.glue.catalogid": "<AWS account ID>",
"hive.s3.aws-access-key": "{SECRET_REF:aws-user-access-key-ref}",
"hive.s3.aws-secret-key": "{SECRET_REF:aws-user-access-secret-ref}"
}
}
]
}
]
}
]
AWS access keys from Azure Key Vault
Catalog examples in the previous code refer to access keys stored as secrets in Azure Key Vault, here's how you can configure that.
"secretsProfile": {
"keyVaultResourceId": "/subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourceGroups/trino-rp/providers/Microsoft.KeyVault/vaults/trinoakv",
"secrets": [
{
"referenceName": "aws-user-access-key-ref",
"keyVaultObjectName": "aws-user-access-key",
"type": "secret"
},
{
"referenceName": "aws-user-access-secret-ref",
"keyVaultObjectName": "aws-user-access-secret",
"type": "secret"
}
]
},
Deploy the updated ARM template to reflect the changes in your cluster. Learn how to deploy an ARM template.
Quickstart with AWS Glue and S3
1. Create AWS user and save access keys to Azure Key Vault.
Use existing or create new user in AWS IAM - this user is used by Trino connector to read data from Glue/S3. Create and retrieve access keys on Security Credentials tab and save them as secrets into Azure Key Vault linked to your Trino cluster. Refer to Add catalogs to existing cluster for details on how to link Key Vault to your Trino cluster.
2. Create AWS S3 bucket
Use existing or create new S3 bucket, it's used in Glue database as location to store data.
3. Setup AWS Glue Database
In AWS Glue, create new database, for example, "trinodb" and configure location, which points to your S3 bucket from previous step, for example, s3://trinoglues3/
4. Configure Trino catalog
Configure a Trino catalog using examples above Trino catalogs with S3 and Glue as metastore.
5. Create and query sample table
Here are few sample queries to test connectivity to AWS reading and writing data. Schema name is AWS Glue database name you created earlier.
create table iceglue.trinodb.tpch_orders_ice as select * from tpch.sf1.orders;
select * from iceglue.trinodb.tpch_orders_ice;