Query logging

Important

This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.

Trino supports custom event listeners that can be used to listen for Query lifecycle events. You can author your own event listeners or use a built-in plugin provided by HDInsight on AKS that logs events to Azure Blob Storage.

You can enable built-in query logging in two ways:

  • You can enable built-in query logging during Trino cluster creation by enabling hive catalog.

  • You can enable built-in query logging in your cluster using ARM template.

This article covers addition of query logging to your cluster using ARM template.

Prerequisites

Enable query logging

To enable the built-in query logging plugin in your Trino cluster, add/update clusterProfile.trinoProfile.userTelemetrySpec section with the following properties in your cluster ARM template.

Property Description
path Fully qualified path to a directory used as a root to capture different query logs.
hivecatalogName This catalog is used to mount external tables on the files written in storage account. This catalog must be added in your cluster, Add hive catalog.
hivecatalogSchema Query logging plugin uses this schema to mount the external table for the logs, plugin creates this schema if it doesn't exist already. Default value - trinologs
partitionRetentionInDays Query logging plugin prunes the partitions in the log tables, which are older than the specified configuration. Default value - 365

The following example demonstrates how a query logging is enabled in a Trino cluster. Add this sample json under [*].properties.clusterProfile in the ARM template.

       "trinoProfile": { 
          "userTelemetrySpec": { 
          "storage": { 
              "path": "https://querylogstorageaccount.blob.core.windows.net/logs/trinoquerylogs", 
              "hivecatalogName": "hive", 
              "hivecatalogSchema": "trinologs", 
              "partitionRetentionInDays": 365 
          } 
          }
      }   

Deploy the updated ARM template to reflect the changes in your cluster. Learn how to deploy an ARM template.

Note

  • Plugin uses user-assigned managed identity (MSI) tied to the cluster to authenticate against the storage, please add Contributor and Storage Blob Data Owner access to the MSI to ensure plugin can write logs to the storage account.
    User-assigned MSI name is listed in the msiResourceId property in the cluster's resource JSON. Learn how to assign a role.

  • PartitionRetentionInDays only removes the metadata partition from the mounted table, it doesn't delete the data. Please clean up the data as per your requirements if not needed anymore.

Metadata management

If the user specifies a catalog name in hiveCatalogName property, plugin mounts the logs files written in storage account as external tables and views, which can be queried through Trino.

The plugin creates three tables and three views, which can be used to query the lifecycle events (QueryCompletedEvent, QueryCreatedEvent, and SplitCompletedEVent). These tables & views are created under the catalog and schema provided as user input.

Name of tables:

  • querycompleted: Contains QueryCompleted events fired by Trino.
  • querycreated: Contains QueryCreatedEvents fired by Trino.
  • splitcompleted: Contains SplitCompletedEvents fired by Trino.

Name of views:

  • vquerycompleted
  • vquerycreated
  • vsplitcompleted

Note

Users are encouraged to use the views as they are immune to underlying schema changes and account for table described.

Table archival

The plugin supports archiving (N-1)th table in the scenario where user decides to the change the path or external location of the logs. If that happens, plugin renames the table pointing to the old path as <table_name>_archived, the view created will union the result of current and the archived tables in this scenario.

Create your custom plugin

You can also author a custom event listener plugin, follow the directions on docs, Deploy custom plugins by following plugin deployment steps.