Read Delta Lake tables (Synapse or external location)
Note
We will retire Azure HDInsight on AKS on January 31, 2025. Before January 31, 2025, you will need to migrate your workloads to Microsoft Fabric or an equivalent Azure product to avoid abrupt termination of your workloads. The remaining clusters on your subscription will be stopped and removed from the host.
Only basic support will be available until the retirement date.
Important
This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.
This article provides an overview of how to read a Delta Lake table without having any access to the metastore (Synapse or other metastores without public access).
You can perform the following operations on the tables using Trino with HDInsight on AKS.
- DELETE
- UPDATE
- INSERT
- MERGE
Prerequisites
Create Delta Lake schemas and tables
This section shows how to create a Delta table over a pre-existing location given you already have a Delta Lake catalog configured.
Browse the storage account using the
Storage browser
in the Azure portal to where the base directory of your table is. If this table originates in Synapse, it's likely under asynapse/workspaces/.../warehouse/
path and will be named after your table and contains a_delta_log
directory. SelectCopy URL
from the three dots located next to the folder.You need to convert this http path into an ABFS (Azure Blob File System) path:
The storage http path is structured like this:
https://{{AZURE_STORAGE_ACCOUNT}}.blob.core.windows.net/{{AZURE_STORAGE_CONTAINER}}/synapse/workspaces/my_workspace/warehouse/{{TABLE_NAME}}/
ABFS paths need to look like this:
abfss://{{AZURE_STORAGE_CONTAINER}}@{{AZURE_STORAGE_ACCOUNT}}.dfs.core.windows.net/synapse/workspaces/my_workspace/warehouse/{{TABLE_NAME}}/
Example:
abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name/
Create a Delta Lake schema in Trino.
CREATE SCHEMA delta.default;
Alternatively, you can also create a schema in a specific storage account:
CREATE SCHEMA delta.default WITH (location = 'abfss://container@storageaccount.dfs.core.windows.net/trino/');
Use the
register_table
procedure to create the table.CALL delta.system.register_table(schema_name => 'default', table_name => 'table_name', table_location => 'abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name/');
Query the table to verify.
SELECT * FROM delta.default.table_name
Write Delta Lake tables in Synapse Spark
Use format("delta")
to save a dataframe as a Delta table, then you can use the path where you saved the dataframe as delta format to register the table in Trino.
my_dataframe.write.format("delta").save("abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name")