Events
Mar 31, 11 p.m. - Apr 2, 11 p.m.
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Applies to:
SQL Server 2022 (16.x)
This article explains how to use PolyBase to query external data in S3-compatible object storage.
SQL Server 2022 (16.x) introduces the ability to connect to any S3-compatible object storage, there are two available options for authentication: basic authentication or pass-through authorization (also known as STS authorization).
Basic Authentication, also known as static credentials, requires the user to store the access key id
and secret key id
in SQL Server, it is up to the user to explicitly revoke and rotate the credentials whenever needed. Fine-grained access control would require the administrator to set up static credentials for each login, this approach can be challenging when dealing with dozens or hundreds of unique credentials.
Pass-through (STS) authorization offers a solution for these problems by enabling the use of SQL Server own user's identities to access the S3-compatible object storage. S3-compatible object storage has the ability of assigning a temporary credential by using the Secure Token Service (STS). These credentials are short termed and dynamically generated.
This article includes instructions for both Basic Authentication and pass-through authorization (STS) authorization.
To use the S3-compatible object storage integration features, you need the following tools and resources:
Access Key ID
) and the secret (Secret Key ID
) known to you. You need both to authenticate against the S3 object storage endpoint.In order for the proxy user to read the content of an S3 bucket, the user (Access Key ID
) needs to be allowed to perform the following actions against the S3 endpoint:
Msg 4860, Level 16, State 7, Line 15 Cannot bulk load. The file "s3://<ip address>:9000/bucket/*.*" does not exist or you don't have file access rights.
Tip
Your S3-compliant object storage provider might require additional API operation permissions, or use different naming for roles containing permissions to API operations. Consult your product documentation.
Enable PolyBase in sp_configure
:
EXEC sp_configure @configname = 'polybase enabled', @configvalue = 1;
GO
RECONFIGURE
GO
Confirm the setting:
EXEC sp_configure @configname = 'polybase enabled';
To continue, choose Basic Authentication or pass-through (STS) authorization.
Before you create a database scoped credential, the user database must have a master key to protect the credential. For more information, see CREATE MASTER KEY.
The following sample script creates a database scoped credential s3-dc
in the database_name
database in a SQL Server instance. For more information, see CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL).
USE [database_name];
GO
IF NOT EXISTS(SELECT * FROM sys.database_scoped_credentials WHERE name = 's3_dc')
BEGIN
CREATE DATABASE SCOPED CREDENTIAL s3_dc
WITH IDENTITY = 'S3 Access Key',
SECRET = '<AccessKeyID>:<SecretKeyID>' ;
END
GO
Verify the new database-scoped credential with sys.database_scoped_credentials (Transact-SQL):
SELECT * FROM sys.database_scoped_credentials;
The following sample script creates an external data source s3_ds
in the source user database in SQL Server. The external data source references the s3_dc
database scoped credential. For more information, see CREATE EXTERNAL DATA SOURCE.
CREATE EXTERNAL DATA SOURCE s3_ds
WITH
( LOCATION = 's3://<ip_address>:<port>/'
, CREDENTIAL = s3_dc
);
GO
Verify the new external data source with sys.external_data_sources.
SELECT * FROM sys.external_data_sources;
Some S3-compatible storage systems (such as Amazon Web Services) utilize virtual_hosted
style URLs to implement folder structure in the S3 bucket. Add the following CONNECTION_OPTIONS
to allow for creation of external tables pointing to folder locations in the S3 bucket, for example CONNECTION_OPTIONS = '{"s3":{"url_style":"virtual_hosted"}}'
.
Without that CONNECTION_OPTIONS
setting, when querying external tables pointing to a folder, you might observe the following error:
Msg 13807, Level 16, State 1, Line 23
Content of directory on path '/<folder_name>/' cannot be listed.
:
character in it.s3://<hostname>/<objectkey>
shouldn't exceed 259 characters. The s3://
counts toward this limit, so the path length cannot exceed 259-5 = 254 characters.S3-compatible object storage has the ability to assign a temporary credential by using Secure Token Service (STS). These credentials are short termed and dynamically generated.
Pass-through authorization relies on Active Directory Federation Service (ADFS) acting as OpenID Connect (OIDC) identity provider, it is up to the ADFS to communicate with the S3-compatible object storage STS, request the STS and provide it back to SQL Server.
TLS must be configured with certificates between the SQL Server and the S3-compatible host server. It is assumed that all connections will be securely transmitted over HTTPS, not HTTP. The endpoint will be validated by a certificate installed on the SQL Server OS Host. Public or self-signed certificates are supported.
Create a database scoped credential to which will be used to pass the identity to the S3-compatible object storage. For more information, see CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL). As the following example:
CREATE DATABASE SCOPED CREDENTIAL CredName
WITH IDENTITY = 'User Identity'
Create an external data source to access the S3-compatible object storage. Use CONNECTION_OPTIONS
, as JSON format, to inform the required information for both the ADFS and STS. For more information, see CREATE EXTERNAL DATA SOURCE. As the following example:
CREATE EXTERNAL DATA SOURCE EdsName
WITH
{
LOCATION = 's3://<hostname>:<port>/<bucket_name>'
, CREDENTIAL = <CredName>
[ , CONNECTION_OPTIONS = ' {
[ , "authorization": {
"adfs": {
"endpoint": "http[s]://hostname:port/servicepath",
"relying_party": "SQL Server Relying Party Identifier"
},
"sts": {
"endpoint": "http[s]://hostname:port/stspath",
"role_arn": "Role Arn"
[ , "role_session_name": "AD user login" ] -- default value if not provided
[ , "duration_seconds": 3600 ] -- default value if not provided
[ , "version": "2011-06-15" ] -- default value if not provided
[ , "request_parameters": "In request query string format" ]
}
} ]
[ , "s3": {
"url_style": "Path"
} ]
}' ]
}
ADFS
options specify Windows transport endpoint and relying_party
identifier of SQL Server in ADFS.STS
options specify S3-compatible object storage STS endpoint and parameters for AssumeRoleWithWebIdentity
request. The AssumeRoleWithWebIdentity
is the method used to acquire the temporary security credential used to authenticate. For the complete list of parameters, including optional ones, and information about default values, refer to STS API Reference.Follow documentation provided by S3-compatible storage provider to set up external OIDC identity provider. To set up identity provider mostly following values are commonly needed.
For PolyBase on SQL Server on Linux, more configuration is needed.
Events
Mar 31, 11 p.m. - Apr 2, 11 p.m.
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayTraining
Module
Introduction to SQL Server 2022 data virtualization - Training
Learn about data virtualization, how to use Polybase to access and query external data, and enhanced Polybase features in SQL Server 2022.
Certification
Microsoft Certified: Azure Database Administrator Associate - Certifications
Administer an SQL Server database infrastructure for cloud, on-premises and hybrid relational databases using the Microsoft PaaS relational database offerings.
Documentation
Access external data: SQL Server - PolyBase - SQL Server
Learn how to use PolyBase on a SQL Server instance to query external data in another SQL Server instance. Create external tables to reference external data.
Virtualize parquet file with PolyBase - SQL Server
SQL Server can virtualize data from parquet files in S3-compatible object storage using PolyBase.
Get started with PolyBase in SQL Server 2022 - SQL Server
A tutorial for getting started with PolyBase in SQL Server 2022.