Редактиране

Споделяне чрез


Read Delta Lake tables (Synapse or external location)

Note

We will retire Azure HDInsight on AKS on January 31, 2025. Before January 31, 2025, you will need to migrate your workloads to Microsoft Fabric or an equivalent Azure product to avoid abrupt termination of your workloads. The remaining clusters on your subscription will be stopped and removed from the host.

Only basic support will be available until the retirement date.

Important

This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.

This article provides an overview of how to read a Delta Lake table without having any access to the metastore (Synapse or other metastores without public access).

You can perform the following operations on the tables using Trino with HDInsight on AKS.

  • DELETE
  • UPDATE
  • INSERT
  • MERGE

Prerequisites

Create Delta Lake schemas and tables

This section shows how to create a Delta table over a pre-existing location given you already have a Delta Lake catalog configured.

  1. Browse the storage account using the Storage browser in the Azure portal to where the base directory of your table is. If this table originates in Synapse, it's likely under a synapse/workspaces/.../warehouse/ path and will be named after your table and contains a _delta_log directory. Select Copy URL from the three dots located next to the folder.

    You need to convert this http path into an ABFS (Azure Blob File System) path:

    The storage http path is structured like this: https://{{AZURE_STORAGE_ACCOUNT}}.blob.core.windows.net/{{AZURE_STORAGE_CONTAINER}}/synapse/workspaces/my_workspace/warehouse/{{TABLE_NAME}}/

    ABFS paths need to look like this: abfss://{{AZURE_STORAGE_CONTAINER}}@{{AZURE_STORAGE_ACCOUNT}}.dfs.core.windows.net/synapse/workspaces/my_workspace/warehouse/{{TABLE_NAME}}/

    Example: abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name/

  2. Create a Delta Lake schema in Trino.

    CREATE SCHEMA delta.default;
    

    Alternatively, you can also create a schema in a specific storage account:

    CREATE SCHEMA delta.default WITH (location = 'abfss://container@storageaccount.dfs.core.windows.net/trino/');
    
  3. Use the register_table procedure to create the table.

    CALL delta.system.register_table(schema_name => 'default', table_name => 'table_name', table_location => 'abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name/');
    
  4. Query the table to verify.

    SELECT * FROM delta.default.table_name
    

Write Delta Lake tables in Synapse Spark

Use format("delta") to save a dataframe as a Delta table, then you can use the path where you saved the dataframe as delta format to register the table in Trino.

my_dataframe.write.format("delta").save("abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name")

Next steps

How to configure caching in Trino