Virtualize delta table with PolyBase
Applies to:
SQL Server 2022 (16.x) and later
SQL Server 2022 (16.x) can query data directly from a delta table folder. This concept, commonly referred to as data virtualization, allows the data to stay in its original location, but can be queried from a SQL Server instance with T-SQL commands like any other table. This feature uses PolyBase connectors, and minimizes the need for copying data via ETL processes.
In the example below, the delta table folder is stored on Azure Blob Storage and accessed via OPENROWQUERY or an external table.
For more information on data virtualization, Introducing data virtualization with PolyBase.
Pre-configuration
1. Enable PolyBase in sp_configure
exec sp_configure @configname = 'polybase enabled', @configvalue = 1;
RECONFIGURE;
2. Create a user database
This exercise creates a sample database with default settings and location. You'll use this empty sample database to work with the data and store the scoped credential. In this example, a new empty database named Delta_demo
will be used.
CREATE DATABASE [Delta_demo];
3. Create a master key and database scoped credential
The database master key in the user database is required to encrypt the database scoped credential secret, delta_storage_dsc
. For this example, the delta table resides on Azure Data Lake Storage Gen2.
USE [Delta_demo];
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'password';
CREATE DATABASE SCOPED CREDENTIAL delta_storage_dsc
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = '<SAS Token>';
4. Create external data source
Database scoped credential will be used for the external data source. In this example, the delta table resides in Azure Data Lake Storage Gen2, so use prefix adls
and the SHARED ACCESS SIGNATURE
identity method. For more information about the connectors and prefixes, including new settings for SQL Server 2022 (16.x), refer to CREATE EXTERNAL DATA SOURCE.
CREATE EXTERNAL DATA SOURCE Delta_ED
WITH
(
LOCATION = 'adls://<container>@<storage_account>.dfs.core.windows.net'
,CREDENTIAL = delta_storage_dsc
);
For example, if your storage account is named delta_lake_sample
and the container is named sink
, the code would be:
CREATE EXTERNAL DATA SOURCE Delta_ED
WITH
(
LOCATION = 'adls://sink@delta_lake_sample.dfs.core.windows.net'
,CREDENTIAL = delta_storage_dsc
)
Use OPENROWQUERY to access the data
In this example the Data Table folder is named Contoso
.
Since the external data source Delta_ED
is mapped to a container level. The Contoso
delta table folder is located in a root. To query a file in a folder structure, provide a folder mapping relative to the external data source's LOCATION parameter.
SELECT *
FROM OPENROWSET
( BULK '/Contoso'
, FORMAT = 'DELTA'
, DATA_SOURCE = 'Delta_ED'
) as [result]
Query data with an external table
CREATE EXTERNAL TABLE can also be used to virtualize the delta table data in SQL Server. The columns must be defined and strongly typed. While external tables take more effort to create, they also provide additional benefits over querying an external data source with OPENROWSET. You can:
- Strengthen the definition of the data typing for a given column.
- Define nullability.
- Define COLLATION.
- Create statistics for a column to optimize the quality of the query plan.
- Create a more granular model within SQL Server for data access to enhance your security model.
For more information, see CREATE EXTERNAL TABLE.
For the following example, the same data source will be used.
1. Create external file format
To define the file's formatting, an external file format is required. External file formats are also recommended due to reusability. For more information, see CREATE EXTERNAL FILE FORMAT.
CREATE EXTERNAL FILE FORMAT DeltaTableFormat WITH(FORMAT_TYPE = DELTA);
2. Create external table
The delta table files are located at /delta/Delta_yob/
and the external data source for this example is S3-compatible object storage, previously configured under the data source s3_eds
. PolyBase can use the as LOCATION the delta table folder or the absolute file itself, which would be located at delta/Delta_yob/_delta_log/00000000000000000000.json
.
-- Create External Table using delta
CREATE EXTERNAL TABLE extCall_Center_delta
( id int,
name VARCHAR(200),
dob date
)WITH
( LOCATION = '/delta/Delta_yob/'
, FILE_FORMAT = DeltaTableFormat
, DATA_SOURCE = s3_eds
)
GO
Next steps
For more tutorials on creating external data sources and external tables to a variety of data sources, see PolyBase Transact-SQL reference.
Feedback
Submit and view feedback for