Quickstart: Bulk load data using the COPY statement
In this quickstart, you'll bulk load data into your dedicated SQL pool using the simple and flexible COPY statement for high-throughput data ingestion. The COPY statement is the recommended loading utility as it enables you to seamlessly and flexibly load data by providing functionality to:
- Allow lower privileged users to load without needing strict CONTROL permissions on the data warehouse
- Leverage only a single T-SQL statement without having to create any additional database objects
- Leverage a finer permission model without exposing storage account keys using Share Access Signatures (SAS)
- Specify a different storage account for the ERRORFILE location (REJECTED_ROW_LOCATION)
- Customize default values for each target column and specify source data fields to load into specific target columns
- Specify a custom row terminator for CSV files
- Escape string, field, and row delimiters for CSV files
- Leverage SQL Server Date formats for CSV files
- Specify wildcards and multiple files in the storage location path
Prerequisites
This quickstart assumes you already have a dedicated SQL pool. If a dedicated SQL pool hasn't been created, use the Create and Connect - portal quickstart.
Set up the required permissions
-- List the permissions for your user
select princ.name
, princ.type_desc
, perm.permission_name
, perm.state_desc
, perm.class_desc
, object_name(perm.major_id)
from sys.database_principals princ
left join
sys.database_permissions perm
on perm.grantee_principal_id = princ.principal_id
where name = '<yourusername>';
--Make sure your user has the permissions to CREATE tables in the [dbo] schema
GRANT CREATE TABLE TO <yourusername>;
GRANT ALTER ON SCHEMA::dbo TO <yourusername>;
--Make sure your user has ADMINISTER DATABASE BULK OPERATIONS permissions
GRANT ADMINISTER DATABASE BULK OPERATIONS TO <yourusername>
--Make sure your user has INSERT permissions on the target table
GRANT INSERT ON <yourtable> TO <yourusername>
Create the target table
In this example, we'll be loading data from the New York taxi dataset. We'll load a table called Trip that represents taxi trips taken within a single year. Run the following to create the table:
CREATE TABLE [dbo].[Trip]
(
[DateID] int NOT NULL,
[MedallionID] int NOT NULL,
[HackneyLicenseID] int NOT NULL,
[PickupTimeID] int NOT NULL,
[DropoffTimeID] int NOT NULL,
[PickupGeographyID] int NULL,
[DropoffGeographyID] int NULL,
[PickupLatitude] float NULL,
[PickupLongitude] float NULL,
[PickupLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DropoffLatitude] float NULL,
[DropoffLongitude] float NULL,
[DropoffLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[PassengerCount] int NULL,
[TripDurationSeconds] int NULL,
[TripDistanceMiles] float NULL,
[PaymentType] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[FareAmount] money NULL,
[SurchargeAmount] money NULL,
[TaxAmount] money NULL,
[TipAmount] money NULL,
[TollsAmount] money NULL,
[TotalAmount] money NULL
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
);
Run the COPY statement
Run the following COPY statement that will load data from the Azure blob storage account into the Trip table.
COPY INTO [dbo].[Trip] FROM 'https://nytaxiblob.blob.core.windows.net/2013/Trip2013/'
WITH (
FIELDTERMINATOR='|',
ROWTERMINATOR='0x0A'
) OPTION (LABEL = 'COPY: dbo.trip');
Monitor the load
Check whether your load is making progress by periodically running the following query:
SELECT r.[request_id]
, r.[status]
, r.resource_class
, r.command
, sum(bytes_processed) AS bytes_processed
, sum(rows_processed) AS rows_processed
FROM sys.dm_pdw_exec_requests r
JOIN sys.dm_pdw_dms_workers w
ON r.[request_id] = w.request_id
WHERE [label] = 'COPY: dbo.trip' and session_id <> session_id() and type = 'WRITER'
GROUP BY r.[request_id]
, r.[status]
, r.resource_class
, r.command;
Next steps
- For best practices on data loading, see Best Practices for Loading Data.
- For information on how to manage the resources for your data loads, see Workload Isolation.