Επεξεργασία

Κοινή χρήση μέσω


Control storage account access for serverless SQL pool in Azure Synapse Analytics

A serverless SQL pool query reads files directly from Azure Storage. Permissions to access the files on Azure storage are controlled at two levels:

  • Storage level - User should have permission to access underlying storage files. Your storage administrator should allow Microsoft Entra principal to read/write files, or generate shared access signature (SAS) key that will be used to access storage.
  • SQL service level - User should have granted permission to read data using external table or to execute the OPENROWSET function. Read more about the required permissions in this section.

This article describes the types of credentials you can use and how credential lookup is enacted for SQL and Microsoft Entra users.

Storage permissions

A serverless SQL pool in Synapse Analytics workspace can read the content of files stored in Azure Data Lake storage. You need to configure permissions on storage to enable a user who executes a SQL query to read the files. There are three methods for enabling the access to the files:

  • Role based access control (RBAC) enables you to assign a role to some Microsoft Entra user in the tenant where your storage is placed. A reader must be a member of the Storage Blob Data Reader, Storage Blob Data Contributor, or Storage Blob Data Owner role on the storage account. A user who writes data in the Azure storage must be a member of the Storage Blob Data Contributor or Storage Blob Data Owner role. The Storage Owner role does not imply that a user is also Storage Data Owner.
  • Access Control Lists (ACL) enable you to define a fine grained Read(R), Write(W), and Execute(X) permissions on the files and directories in Azure storage. ACL can be assigned to Microsoft Entra users. If readers want to read a file on a path in Azure Storage, they must have Execute(X) ACL on every folder in the file path, and Read(R) ACL on the file. Learn more how to set ACL permissions in storage layer.
  • Shared access signature (SAS) enables a reader to access the files on the Azure Data Lake storage using the time-limited token. The reader doesn't even need to be authenticated as Microsoft Entra user. SAS token contains the permissions granted to the reader as well as the period when the token is valid. SAS token is good choice for time-constrained access to any user that doesn't even need to be in the same Microsoft Entra tenant. SAS token can be defined on the storage account or on specific directories. Learn more about granting limited access to Azure Storage resources using shared access signatures.

As an alternative, you can make your files publicly available by allowing anonymous access. This approach should NOT be used if you have non-public data.

Supported storage authorization types

A user that has logged into a serverless SQL pool must be authorized to access and query the files in Azure Storage if the files aren't publicly available. You can use four authorization types to access non-public storage: user identity, shared access signature, service principal, and managed identity.

Note

Microsoft Entra pass-through is the default behavior when you create a workspace.

User identity, also known as "Microsoft Entra pass-through", is an authorization type where the identity of the Microsoft Entra user that logged into serverless SQL pool is used to authorize data access. Before accessing the data, the Azure Storage administrator must grant permissions to the Microsoft Entra user. As indicated in the Supported authorization types for database users table, it's not supported for the SQL user type.

Important

A Microsoft Entra authentication token might be cached by the client applications. For example, Power BI caches Microsoft Entra tokens and reuses the same token for an hour. Long-running queries might fail if the token expires in the middle of the query execution. If you are experiencing query failures caused by the Microsoft Entra access token that expires in the middle of the query, consider switching to a service principal, managed identity or shared access signature.

You need to be a member of the Storage Blob Data Owner, Storage Blob Data Contributor, or Storage Blob Data Reader role to use your identity to access the data. As an alternative, you can specify fine-grained ACL rules to access files and folders. Even if you are an Owner of a Storage Account, you still need to add yourself into one of the Storage Blob Data roles. To learn more about access control in Azure Data Lake Store Gen2, review the Access control in Azure Data Lake Storage Gen2 article.

Cross-tenant scenarios

In cases when Azure Storage is in a different tenant from the Synapse serverless SQL pool, authorization via Service Principal is the recommended method. SAS authorization is also possible, while Managed Identity is not supported.

Authorization Type Firewall protected storage non-Firewall protected storage
SAS Supported Supported
Service Principal Not Supported Supported

Note

If Azure Storage is protected by an Azure Storage firewall, Service Principal will not be supported.

Supported authorization types for databases users

The following table provides available Azure Storage authorization types for different sign-in methods into an Azure Synapse Analytics serverless SQL endpoint:

Authorization type SQL user Microsoft Entra user Service principal
User Identity Not Supported Supported Supported
SAS Supported Supported Supported
Service principal Supported Supported Supported
Managed Identity Supported Supported Supported

Supported storages and authorization types

You can use the following combinations of authorization types and Azure Storage types:

Authorization type Blob Storage ADLS Gen1 ADLS Gen2
SAS Supported Not supported Supported
Service principal Supported Supported Supported
Managed Identity Supported Supported Supported
User Identity Supported Supported Supported

Cross-tenant scenarios

In cases when Azure Storage is in a different tenant from the Azure Synapse Analytics serverless SQL pool, authorization via service principal is the recommended method. Shared access signature authorization is also possible. Managed service identity is not supported.

Authorization Type Firewall protected storage non-Firewall protected storage
SAS Supported Supported
Service principal Not Supported Supported

Note

If Azure Storage is protected by an Azure Storage firewall and is in another tenant, service principal will not be supported. Instead, use a shared access signature (SAS).

Firewall protected storage

You can configure storage accounts to allow access to a specific serverless SQL pool by creating a resource instance rule. When accessing storage that is protected with the firewall, use User Identity or Managed Identity.

Note

The firewall feature on Azure Storage is in public preview and is available in all public cloud regions.

The following table provides available firewall-protected Azure Storage authorization types for different sign-in methods into an Azure Synapse Analytics serverless SQL endpoint:

Authorization type SQL user Microsoft Entra user Service principal
User Identity Not Supported Supported Supported
SAS Not Supported Not Supported Not Supported
Service principal Not Supported Not Supported Not Supported
Managed Identity Supported Supported Supported

To access storage that is protected with the firewall via a user identity, you can use the Azure portal or the Az.Storage PowerShell module.

Azure Storage firewall configuration via Azure portal

  1. Search for your Storage Account in Azure portal.
  2. In the main navigation menu, go to Networking under Settings.
  3. In the section Resource instances, add an exception for your Azure Synapse workspace.
  4. Select Microsoft.Synapse/workspaces as a Resource type.
  5. Select the name of your workspace as an Instance name.
  6. Select Save.

Azure Storage firewall configuration via PowerShell

Follow these steps to configure your storage account and add an exception for the Azure Synapse workspace.

  1. Open PowerShell or install PowerShell.

  2. Install the latest versions of the Az.Storage module and Az.Synapse module, for example in the following script:

    Install-Module -Name Az.Storage -RequiredVersion 3.4.0
    Install-Module -Name Az.Synapse -RequiredVersion 0.7.0
    

    Important

    Make sure that you use at least version 3.4.0. You can check your Az.Storage version by running this command:

    Get-Module -ListAvailable -Name Az.Storage | Select Version
    
  3. Connect to your Azure Tenant:

    Connect-AzAccount
    
  4. Define variables in PowerShell:

    • Resource group name - you can find this in Azure portal in the Overview of your storage account.
    • Account Name - name of the storage account that is protected by firewall rules.
    • Tenant ID - you can find this in Azure portal in Microsoft Entra ID, under Properties, in Tenant properties.
    • Workspace Name - Name of the Azure Synapse workspace.
        $resourceGroupName = "<resource group name>"
        $accountName = "<storage account name>"
        $tenantId = "<tenant id>"
        $workspaceName = "<Azure Synapse workspace name>"
    
        $workspace = Get-AzSynapseWorkspace -Name $workspaceName
        $resourceId = $workspace.Id
        $index = $resourceId.IndexOf("/resourceGroups/", 0)
        # Replace G with g - /resourceGroups/ to /resourcegroups/
        $resourceId = $resourceId.Substring(0,$index) + "/resourcegroups/" ` 
            + $resourceId.Substring($index + "/resourceGroups/".Length)
    
        $resourceId
    

    Important

    The value of the $resourceid returned by the PowerShell script should match this template: /subscriptions/{subscription-id}/resourcegroups/{resource-group}/providers/Microsoft.Synapse/workspaces/{name-of-workspace} It's important to write resourcegroups in lower case.

  5. Add an Azure storage account network rule:

        $parameters = @{
            ResourceGroupName = $resourceGroupName
            Name = $accountName
            TenantId = $tenantId 
            ResourceId = $resourceId
        }
    
        Add-AzStorageAccountNetworkRule @parameters
    
  6. Verify that storage account network rule was applied in your storage account firewall. The following PowerShell script compares the $resourceid variable from previous steps to the output of the storage account network rule.

        $parameters = @{
            ResourceGroupName = $resourceGroupName
            Name = $accountName
        }
    
        $rule = Get-AzStorageAccountNetworkRuleSet @parameters
        $rule.ResourceAccessRules | ForEach-Object { 
            if ($_.ResourceId -cmatch "\/subscriptions\/(\w\-*)+\/resourcegroups\/(.)+") { 
                Write-Host "Storage account network rule is successfully configured." -ForegroundColor Green
                $rule.ResourceAccessRules
            } else {
                Write-Host "Storage account network rule is not configured correctly. Remove this rule and follow the steps in detail." -ForegroundColor Red
                $rule.ResourceAccessRules
            }
        }
    

Credentials

To query a file located in Azure Storage, your serverless SQL pool endpoint needs a credential that contains the authentication information. Two types of credentials are used:

  • Server-level credential is used for ad-hoc queries executed using OPENROWSET function. The credential name must match the storage URL.
  • A database-scoped credential is used for external tables. External table references DATA SOURCE with the credential that should be used to access storage.

Grant permissions to manage credentials

To grant the ability manage credentials:

  • To allow a user to create or drop a server-level credential, an administrator must grant the ALTER ANY CREDENTIAL permission to its login in the master database. For example:

    GRANT ALTER ANY CREDENTIAL TO [login_name];
    
  • To allow a user to create or drop a database scoped credential, an administrator must grant the CONTROL permission on the database to the database user in the user database. For example:

    GRANT CONTROL ON DATABASE::[database_name] TO [user_name];
    

Grant permissions to use credential

Database users who access external storage must have permission to use credentials. To use the credential, a user must have the REFERENCES permission on a specific credential.

To grant the REFERENCES permission on a server-level credential for a login, use the following T-SQL query in the master database:

GRANT REFERENCES ON CREDENTIAL::[server-level_credential] TO [login_name];

To grant a REFERENCES permission on a database-scoped credential for a database user, use the following T-SQL query in the user database:

GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::[database-scoped_credential] TO [user_name];

Server-level credential

Server-level credentials are used when a SQL login calls OPENROWSET function without a DATA_SOURCE to read files on a storage account.

The name of server-level credential must match the base URL of Azure storage, optionally followed by a container name. A credential is added by running CREATE CREDENTIAL. You must provide the CREDENTIAL NAME argument.

Note

The FOR CRYPTOGRAPHIC PROVIDER argument is not supported.

Server-level CREDENTIAL name must match the following format: <prefix>://<storage_account_path>[/<container_name>]. Storage account paths are described in the following table:

External Data Source Prefix Storage account path
Azure Blob Storage https <storage_account>.blob.core.windows.net
Azure Data Lake Storage Gen1 https <storage_account>.azuredatalakestore.net/webhdfs/v1
Azure Data Lake Storage Gen2 https <storage_account>.dfs.core.windows.net

Server-level credentials are then able to access Azure storage using the following authentication types:

Microsoft Entra users can access any file on Azure storage if they are members of the Storage Blob Data Owner, Storage Blob Data Contributor, or Storage Blob Data Reader role. Microsoft Entra users don't need credentials to access storage.

SQL authenticated users can't use Microsoft Entra authentication to access storage. They can access storage through a database credential using Managed Identity, SAS Key, Service Principal or if there is public access to the storage.

Database-scoped credential

Database-scoped credentials are used when any principal calls OPENROWSET function with DATA_SOURCE or selects data from external table that don't access public files. The database scoped credential doesn't need to match the name of storage account, it is referenced in DATA SOURCE that defines the location of storage.

Database-scoped credentials enable access to Azure storage using the following authentication types:

Microsoft Entra users can access any file on Azure storage if they are members of the Storage Blob Data Owner, Storage Blob Data Contributor, or Storage Blob Data Reader roles. Microsoft Entra users don't need credentials to access storage.

CREATE EXTERNAL DATA SOURCE mysample
WITH (    LOCATION   = 'https://<storage_account>.dfs.core.windows.net/<container>/<path>'
)

SQL authenticated users can't use Microsoft Entra authentication to access storage. They can access storage through a database credential using Managed Identity, SAS Key, Service Principal or if there is public access to the storage.

Database scoped credentials are used in external data sources to specify what authentication method will be used to access this storage:

CREATE EXTERNAL DATA SOURCE mysample
WITH (    LOCATION   = 'https://<storage_account>.dfs.core.windows.net/<container>/<path>',
          CREDENTIAL = <name of database scoped credential> 
)

Examples

Access a publicly available data source

Use the following script to create a table that accesses publicly available data source.

CREATE EXTERNAL FILE FORMAT [SynapseParquetFormat]
       WITH ( FORMAT_TYPE = PARQUET)
GO
CREATE EXTERNAL DATA SOURCE publicData
WITH ( LOCATION = 'https://<storage_account>.dfs.core.windows.net/<public_container>/<path>' )
GO

CREATE EXTERNAL TABLE dbo.userPublicData ( [id] int, [first_name] varchar(8000), [last_name] varchar(8000) )
WITH ( LOCATION = 'parquet/user-data/*.parquet',
       DATA_SOURCE = [publicData],
       FILE_FORMAT = [SynapseParquetFormat] )

Database user can read the content of the files from the data source using external table or OPENROWSET function that references the data source:

SELECT TOP 10 * FROM dbo.userPublicData;
GO
SELECT TOP 10 * FROM OPENROWSET(BULK 'parquet/user-data/*.parquet',
                                DATA_SOURCE = 'mysample',
                                FORMAT='PARQUET') as rows;
GO

Access a data source using credentials

Modify the following script to create an external table that accesses Azure storage using SAS token, Microsoft Entra identity of user, or managed identity of workspace.

-- Create master key in databases with some password (one-off per database)
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<strong password>'
GO

-- Create databases scoped credential that use Managed Identity, SAS token or service principal. User needs to create only database-scoped credentials that should be used to access data source:

CREATE DATABASE SCOPED CREDENTIAL WorkspaceIdentity
WITH IDENTITY = 'Managed Identity'
GO
CREATE DATABASE SCOPED CREDENTIAL SasCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE', SECRET = 'sv=2019-10-1********ZVsTOL0ltEGhf54N8KhDCRfLRI%3D'
GO
CREATE DATABASE SCOPED CREDENTIAL SPNCredential WITH
IDENTITY = '**44e*****8f6-ag44-1890-34u4-22r23r771098@https://login.microsoftonline.com/**do99dd-87f3-33da-33gf-3d3rh133ee33/oauth2/token' 
, SECRET = '.7OaaU_454azar9WWzLL.Ea9ePPZWzQee~'
GO
-- Create data source that one of the credentials above, external file format, and external tables that reference this data source and file format:

CREATE EXTERNAL FILE FORMAT [SynapseParquetFormat] WITH ( FORMAT_TYPE = PARQUET)
GO

CREATE EXTERNAL DATA SOURCE mysample
WITH ( LOCATION = 'https://<storage_account>.dfs.core.windows.net/<container>/<path>'
-- Uncomment one of these options depending on authentication method that you want to use to access data source:
--,CREDENTIAL = WorkspaceIdentity 
--,CREDENTIAL = SasCredential 
--,CREDENTIAL = SPNCredential
)

CREATE EXTERNAL TABLE dbo.userData ( [id] int, [first_name] varchar(8000), [last_name] varchar(8000) )
WITH ( LOCATION = 'parquet/user-data/*.parquet',
       DATA_SOURCE = [mysample],
       FILE_FORMAT = [SynapseParquetFormat] );

Database user can read the content of the files from the data source using external table or OPENROWSET function that references the data source:

SELECT TOP 10 * FROM dbo.userdata;
GO
SELECT TOP 10 * FROM OPENROWSET(BULK 'parquet/user-data/*.parquet', DATA_SOURCE = 'mysample', FORMAT='PARQUET') as rows;
GO

Next steps

These articles help you learn how query different folder types, file types, and create and use views: