Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest Fabric, Power BI, and SQL learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
This guide outlines how to use the COPY statement to load data from Azure Data Lake Storage. For quick examples on using the COPY statement across all authentication methods, visit the following documentation: Securely load data using dedicated SQL pools.
Note
To provide feedback or report issues on the COPY statement, send an email to the following distribution list: sqldwcopypreview@service.microsoft.com.
If you don't have an Azure subscription, create a free Azure account before you begin.
Before you begin this tutorial, download and install the newest version of SQL Server Management Studio (SSMS).
To run this tutorial, you need:
Connect to your dedicated SQL pool and create the target table you will load to. In this example, we are creating a product dimension table.
-- A: Create the target table
-- DimProduct
CREATE TABLE [dbo].[DimProduct]
(
[ProductKey] [int] NOT NULL,
[ProductLabel] [nvarchar](255) NULL,
[ProductName] [nvarchar](500) NULL
)
WITH
(
DISTRIBUTION = HASH([ProductKey]),
CLUSTERED COLUMNSTORE INDEX
--HEAP
);
Connect to your SQL dedicated pool and run the COPY statement. For a complete list of examples, visit the following documentation: Securely load data using dedicated SQL pools.
-- B: Create and execute the COPY statement
COPY INTO [dbo].[DimProduct]
--The column list allows you map, omit, or reorder input file columns to target table columns.
--You can also specify the default value when there is a NULL value in the file.
--When the column list is not specified, columns will be mapped based on source and target ordinality
(
ProductKey default -1 1,
ProductLabel default 'myStringDefaultWhenNull' 2,
ProductName default 'myStringDefaultWhenNull' 3
)
--The storage account location where you data is staged
FROM 'https://storageaccount.blob.core.windows.net/container/directory/'
WITH
(
--CREDENTIAL: Specifies the authentication method and credential access your storage account
CREDENTIAL = (IDENTITY = '', SECRET = ''),
--FILE_TYPE: Specifies the file type in your storage account location
FILE_TYPE = 'CSV',
--FIELD_TERMINATOR: Marks the end of each field (column) in a delimited text (CSV) file
FIELDTERMINATOR = '|',
--ROWTERMINATOR: Marks the end of a record in the file
ROWTERMINATOR = '0x0A',
--FIELDQUOTE: Specifies the delimiter for data of type string in a delimited text (CSV) file
FIELDQUOTE = '',
ENCODING = 'UTF8',
DATEFORMAT = 'ymd',
--MAXERRORS: Maximum number of reject rows allowed in the load before the COPY operation is canceled
MAXERRORS = 10,
--ERRORFILE: Specifies the directory where the rejected rows and the corresponding error reason should be written
ERRORFILE = '/errorsfolder',
) OPTION (LABEL = 'COPY: ADLS tutorial');
By default, tables are defined as a clustered columnstore index. After a load completes, some of the data rows might not be compressed into the columnstore. There's a variety of reasons why this can happen. To learn more, see manage columnstore indexes.
To optimize query performance and columnstore compression after a load, rebuild the table to force the columnstore index to compress all the rows.
ALTER INDEX ALL ON [dbo].[DimProduct] REBUILD;
It is best to create single-column statistics immediately after a load. There are some choices for statistics. For example, if you create single-column statistics on every column it might take a long time to rebuild all the statistics. If you know certain columns are not going to be in query predicates, you can skip creating statistics on those columns.
If you decide to create single-column statistics on every column of every table, you can use the stored procedure code sample prc_sqldw_create_stats
in the statistics article.
The following example is a good starting point for creating statistics. It creates single-column statistics on each column in the dimension table, and on each joining column in the fact tables. You can always add single or multi-column statistics to other fact table columns later on.
You have successfully loaded data into your data warehouse. Great job!
Loading data is the first step to developing a data warehouse solution using Azure Synapse Analytics. Check out our development resources.
For more loading examples and references, view the following documentation:
Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest Fabric, Power BI, and SQL learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayTraining
Module
Load data into a relational data warehouse - Training
Learn how to load tables in a relational data warehouse that is hosted in a dedicated SQL pool in Azure Synapse Analytics.
Certification
Microsoft Certified: Azure Data Engineer Associate - Certifications
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.
Documentation
Ingest data into a dedicated SQL pool - Azure Synapse Analytics
Learn how to ingest data into a dedicated SQL pool in Azure Synapse Analytics
Ingest into Azure Data Lake Storage Gen2 - Azure Synapse Analytics
Learn how to ingest data into Azure Data Lake Storage Gen2 in Azure Synapse Analytics
Tutorial: Analyze Azure Open Datasets in Synapse Studio - Azure Synapse Analytics
This tutorial shows you how to perform data analysis combining different Azure Open Datasets using serverless SQL pool and visualize results in Synapse Studio.