Securely load data using Synapse SQL

This article highlights and provides examples on the secure authentication mechanisms for the COPY statement. The COPY statement is the most flexible and secure way of bulk loading data in Synapse SQL.

Supported authentication mechanisms

The following matrix describes the supported authentication methods for each file type and storage account. This applies to the source storage location and the error file location.

CSV Parquet ORC
Azure blob storage SAS/MSI/SERVICE PRINCIPAL/KEY/AAD SAS/KEY SAS/KEY
Azure Data Lake Gen2 SAS/MSI/SERVICE PRINCIPAL/KEY/AAD SAS (blob1)/MSI (dfs2)/SERVICE PRINCIPAL/KEY/AAD SAS (blob1)/MSI (dfs2)/SERVICE PRINCIPAL/KEY/AAD

1: The .blob endpoint (.blob.core.windows.net) in your external location path is required for this authentication method.

2: The .dfs endpoint (.dfs.core.windows.net) in your external location path is required for this authentication method.

A. Storage account key with LF as the row terminator (Unix-style new line)

--Note when specifying the column list, input field numbers start from 1
COPY INTO target_table (Col_one default 'myStringDefault' 1, Col_two default 1 3)
FROM 'https://adlsgen2account.dfs.core.windows.net/myblobcontainer/folder1/'
WITH (
    FILE_TYPE = 'CSV'
    ,CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET='<Your_Account_Key>')
    --CREDENTIAL should look something like this:
    --CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET='<secret>'),
    ,ROWTERMINATOR='0x0A' --0x0A specifies to use the Line Feed character (Unix based systems)
)

Important

  • Use the hexadecimal value (0x0A) to specify the Line Feed/Newline character. Note the COPY statement will interpret the \n string as \r\n (carriage return newline).

B. Shared Access Signatures (SAS) with CRLF as the row terminator (Windows style new line)

COPY INTO target_table
FROM 'https://adlsgen2account.dfs.core.windows.net/myblobcontainer/folder1/'
WITH (
    FILE_TYPE = 'CSV'
    ,CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>')
    --CREDENTIAL should look something like this:
    --CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='?sv=2018-03-28&ss=bfqt&srt=sco&sp=rl&st=2016-10-17T20%3A14%3A55Z&se=2021-10-18T20%3A19%3A00Z&sig=IEoOdmeYnE9%2FKiJDSFSYsz4AkN'),
    ,ROWTERMINATOR='\n'-- COPY command automatically prefixes the \r character when \n (newline) is specified. This results in carriage return newline (\r\n) for Windows based systems.
)

Important

Do not specify the ROWTERMINATOR as '\r\n' which will be interpreted as '\r\r\n' and can result in parsing issues. The COPY command automatically prefixes the \r character when \n (newline) is specified. This results in carriage return newline (\r\n) for Windows based systems.

C. Managed Identity

Managed Identity authentication is required when your storage account is attached to a VNet.

Prerequisites

  1. Install Azure PowerShell. Refer to Install PowerShell.
  2. If you have a general-purpose v1 or blob storage account, you must first upgrade to general-purpose v2. Refer to Upgrade to a general-purpose v2 storage account.
  3. You must have Allow trusted Microsoft services to access this storage account turned on under Azure Storage account Firewalls and Virtual networks settings menu. Refer to Configure Azure Storage firewalls and virtual networks.

Steps

  1. If you have a standalone dedicated SQL pool, register your SQL server with Azure Active Directory (Azure AD) using PowerShell:

    Connect-AzAccount
    Select-AzSubscription -SubscriptionId <subscriptionId>
    Set-AzSqlServer -ResourceGroupName your-database-server-resourceGroup -ServerName your-SQL-servername -AssignIdentity
    

    This step is not required for dedicated SQL pools within a Synapse workspace. The system assigned managed identity (SA-MI) of the workspace is a member of the Synapse Administrator role and thus has elevated privileges on the dedicated SQL pools of the workspace.

  2. Create a general-purpose v2 Storage Account. For more information, see Create a storage account.

    Note

  3. Under your storage account, select Access control (IAM).

  4. Select Add > Add role assignment to open the Add role assignment page.

  5. Assign the following role. For detailed steps, see Assign Azure roles using the Azure portal.

    Setting Value
    Role Storage Blob Data Contributor
    Assign access to SERVICEPRINCIPAL
    Members server or workspace hosting your dedicated SQL pool that you've registered with Azure Active Directory (Azure AD)

    Add role assignment page in Azure portal.

    Note

    Only members with Owner privilege can perform this step. For various Azure built-in roles, refer to Azure built-in roles.

    Important

    Specify the Storage Blob Data Owner, Contributor, or Reader Azure role. These roles are different than the Azure built-in roles of Owner, Contributor, and Reader.

    Granting Azure RBAC permission to load

  6. You can now run the COPY statement specifying "Managed Identity":

    COPY INTO dbo.target_table
    FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.txt'
    WITH (
        FILE_TYPE = 'CSV',
        CREDENTIAL = (IDENTITY = 'Managed Identity'),
    )
    

D. Azure Active Directory Authentication

Steps

  1. Under your storage account, select Access control (IAM).

  2. Select Add > Add role assignment to open the Add role assignment page.

  3. Assign the following role. For detailed steps, see Assign Azure roles using the Azure portal.

    Setting Value
    Role Storage Blob Data Owner, Contributor, or Reader
    Assign access to USER
    Members Azure AD user

    Add role assignment page in Azure portal.

    Important

    Specify the Storage Blob Data Owner, Contributor, or Reader Azure role. These roles are different than the Azure built-in roles of Owner, Contributor, and Reader.

    Granting Azure RBAC permission to load

  4. Configure Azure AD authentication. Refer to Configure and manage Azure AD authentication with Azure SQL.

  5. Connect to your SQL pool using Active Directory where you can now run the COPY statement without specifying any credentials:

    COPY INTO dbo.target_table
    FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.txt'
    WITH (
        FILE_TYPE = 'CSV'
    )
    

E. Service Principal Authentication

Steps

  1. Create an Azure Active Directory application.

  2. Get application ID.

  3. Get the authentication key.

  4. Get the V1 OAuth 2.0 token endpoint.

  5. Assign read, write, and execution permissions to your Azure AD application on your storage account.

  6. You can now run the COPY statement:

    COPY INTO dbo.target_table
    FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder0/*.txt'
    WITH (
        FILE_TYPE = 'CSV'
        ,CREDENTIAL=(IDENTITY= '<application_ID>@<OAuth_2.0_Token_EndPoint>' , SECRET= '<authentication_key>')
        --CREDENTIAL should look something like this:
        --,CREDENTIAL=(IDENTITY= '92761aac-12a9-4ec3-89b8-7149aef4c35b@https://login.microsoftonline.com/72f714bf-86f1-41af-91ab-2d7cd011db47/oauth2/token', SECRET='juXi12sZ6gse]woKQNgqwSywYv]7A.M')
    )
    

Important

Use the V1 version of the OAuth 2.0 token endpoint

Next steps