Tutorial: Copy data from a SQL Server database to Azure Blob storage
Article
APPLIES TO:
Azure Data Factory
Azure Synapse Analytics
Tip
Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!
In this tutorial, you use Azure PowerShell to create a data-factory pipeline that copies data from a SQL Server database to Azure Blob storage. You create and use a self-hosted integration runtime, which moves data between on-premises and cloud data stores.
Note
This article does not provide a detailed introduction to the Data Factory service. For more information, see Introduction to Azure Data Factory.
In this tutorial, you perform the following steps:
Create a data factory.
Create a self-hosted integration runtime.
Create SQL Server and Azure Storage linked services.
Create SQL Server and Azure Blob datasets.
Create a pipeline with a copy activity to move the data.
Start a pipeline run.
Monitor the pipeline run.
Prerequisites
Azure subscription
Before you begin, if you don't already have an Azure subscription, create a free account.
Azure roles
To create data factory instances, the user account you use to sign in to Azure must be assigned a Contributor or Owner role or must be an administrator of the Azure subscription.
To view the permissions you have in the subscription, go to the Azure portal, select your username at the top-right corner, and then select Permissions. If you have access to multiple subscriptions, select the appropriate subscription. For sample instructions on adding a user to a role, see the Assign Azure roles using the Azure portal article.
SQL Server 2014, 2016, and 2017
In this tutorial, you use a SQL Server database as a source data store. The pipeline in the data factory you create in this tutorial copies data from this SQL Server database (source) to Azure Blob storage (sink). You then create a table named emp in your SQL Server database, and insert a couple of sample entries into the table.
Connect to your SQL Server instance by using your credentials.
Create a sample database. In the tree view, right-click Databases, and then select New Database.
In the New Database window, enter a name for the database, and then select OK.
To create the emp table and insert some sample data into it, run the following query script against the database. In the tree view, right-click the database that you created, and then select New Query.
CREATE TABLE dbo.emp
(
ID int IDENTITY(1,1) NOT NULL,
FirstName varchar(50),
LastName varchar(50)
)
GO
INSERT INTO emp (FirstName, LastName) VALUES ('John', 'Doe')
INSERT INTO emp (FirstName, LastName) VALUES ('Jane', 'Doe')
GO
Azure Storage account
In this tutorial, you use a general-purpose Azure storage account (specifically, Azure Blob storage) as a destination/sink data store. If you don't have a general-purpose Azure storage account, see Create a storage account. The pipeline in the data factory you that create in this tutorial copies data from the SQL Server database (source) to this Azure Blob storage (sink).
Get storage account name and account key
You use the name and key of your Azure storage account in this tutorial. Get the name and key of your storage account by doing the following:
Sign in to the Azure portal with your Azure username and password.
In the left pane, select More services, filter by using the Storage keyword, and then select Storage accounts.
In the list of storage accounts, filter for your storage account (if needed), and then select your storage account.
In the Storage account window, select Access keys.
In the Storage account name and key1 boxes, copy the values, and then paste them into Notepad or another editor for later use in the tutorial.
Create the adftutorial container
In this section, you create a blob container named adftutorial in your Azure Blob storage.
In the Storage account window, switch to Overview, and then select Blobs.
In the Blob service window, select Container.
In the New container window, in the Name box, enter adftutorial, and then select OK.
In the list of containers, select adftutorial.
Keep the container window for adftutorial open. You use it verify the output at the end of the tutorial. Data Factory automatically creates the output folder in this container, so you don't need to create one.
Start PowerShell on your machine, and keep it open through completion of this quickstart tutorial. If you close and reopen it, you'll need to run these commands again.
Run the following command, and then enter the Azure username and password that you use to sign in to the Azure portal:
Connect-AzAccount
If you have multiple Azure subscriptions, run the following command to select the subscription that you want to work with. Replace SubscriptionId with the ID of your Azure subscription:
Define a variable for the resource group name that you'll use later in PowerShell commands. Copy the following command to PowerShell, specify a name for the Azure resource group (enclosed in double quotation marks; for example, "adfrg"), and then run the command.
$resourceGroupName = "ADFTutorialResourceGroup"
To create the Azure resource group, run the following command:
If the resource group already exists, you may not want to overwrite it. Assign a different value to the $resourceGroupName variable and run the command again.
Define a variable for the data factory name that you can use in PowerShell commands later. The name must start with a letter or a number, and it can contain only letters, numbers, and the dash (-) character.
Important
Update the data factory name with a globally unique name. An example is ADFTutorialFactorySP1127.
$dataFactoryName = "ADFTutorialFactory"
Define a variable for the location of the data factory:
$location = "East US"
To create the data factory, run the following Set-AzDataFactoryV2 cmdlet:
The name of the data factory must be globally unique. If you receive the following error, change the name and try again.
The specified data factory name 'ADFv2TutorialDataFactory' is already in use. Data factory names must be globally unique.
To create data-factory instances, the user account that you use to sign in to Azure must be assigned a contributor or owner role or must be an administrator of the Azure subscription.
For a list of Azure regions in which Data Factory is currently available, select the regions that interest you on the following page, and then expand Analytics to locate Data Factory: Products available by region. The data stores (Azure Storage, Azure SQL Database, and so on) and computes (Azure HDInsight and so on) used by the data factory can be in other regions.
Create a self-hosted integration runtime
In this section, you create a self-hosted integration runtime and associate it with an on-premises machine with the SQL Server database. The self-hosted integration runtime is the component that copies data from the SQL Server database on your machine to Azure Blob storage.
Create a variable for the name of integration runtime. Use a unique name, and note the name. You use it later in this tutorial.
State : NeedRegistration
Version :
CreateTime : 9/10/2019 3:24:09 AM
AutoUpdate : On
ScheduledUpdateDate :
UpdateDelayOffset :
LocalTimeZoneOffset :
InternalChannelEncryption :
Capabilities : {}
ServiceUrls : {eu.frontend.clouddatahub.net}
Nodes : {}
Links : {}
Name : <Integration Runtime name>
Type : SelfHosted
ResourceGroupName : <resourceGroup name>
DataFactoryName : <dataFactory name>
Description : selfhosted IR description
Id : /subscriptions/<subscription ID>/resourceGroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<dataFactoryName>/integrationruntimes/<integrationRuntimeName>
To retrieve the authentication keys for registering the self-hosted integration runtime with the Data Factory service in the cloud, run the following command. Copy one of the keys (excluding the quotation marks) for registering the self-hosted integration runtime that you install on your machine in the next step.
In the Welcome to Microsoft Integration Runtime Setup wizard, select Next.
In the End-User License Agreement window, accept the terms and license agreement, and select Next.
In the Destination Folder window, select Next.
In the Ready to install Microsoft Integration Runtime window, select Install.
In the Completed the Microsoft Integration Runtime Setup wizard, select Finish.
In the Register Integration Runtime (Self-hosted) window, paste the key you saved in the previous section, and then select Register.
In the New Integration Runtime (Self-hosted) Node window, select Finish.
When the self-hosted integration runtime is registered successfully, the following message is displayed:
In the Register Integration Runtime (Self-hosted) window, select Launch Configuration Manager.
When the node is connected to the cloud service, the following message is displayed:
Test the connectivity to your SQL Server database by doing the following:
a. In the Configuration Manager window, switch to the Diagnostics tab.
b. In the Data source type box, select SqlServer.
c. Enter the server name.
d. Enter the database name.
e. Select the authentication mode.
f. Enter the username.
g. Enter the password that's associated with the username.
h. To confirm that integration runtime can connect to the SQL Server, select Test.
If the connection is successful, a green checkmark icon is displayed. Otherwise, you'll receive an error message associated with the failure. Fix any issues, and ensure that the integration runtime can connect to your SQL Server instance.
Note all the preceding values for later use in this tutorial.
Create linked services
To link your data stores and compute services to the data factory, create linked services in the data factory. In this tutorial, you link your Azure storage account and SQL Server instance to the data store. The linked services have the connection information that the Data Factory service uses at runtime to connect to them.
Create an Azure Storage linked service (destination/sink)
In this step, you link your Azure storage account to the data factory.
Create a JSON file named AzureStorageLinkedService.json in the C:\ADFv2Tutorial folder with the following code. If the ADFv2Tutorial folder does not already exist, create it.
Important
Before you save the file, replace <accountName> and <accountKey> with the name and key of your Azure storage account. You noted them in the Prerequisites section.
If you receive a "file not found" error, confirm that the file exists by running the dir command. If the file name has a .txt extension (for example, AzureStorageLinkedService.json.txt), remove it, and then run the PowerShell command again.
Create and encrypt a SQL Server linked service (source)
In this step, you link your SQL Server instance to the data factory.
Create a JSON file named SqlServerLinkedService.json in the C:\ADFv2Tutorial folder by using the following code:
Important
Select the section that's based on the authentication that you use to connect to SQL Server.
Select the section that's based on the authentication you use to connect to your SQL Server instance.
Replace <integration runtime name> with the name of your integration runtime.
Before you save the file, replace <servername>, <databasename>, <username>, and <password> with the values of your SQL Server instance.
If you need to use a backslash (\) in your user account or server name, precede it with the escape character (\). For example, use mydomain\\myuser.
To encrypt the sensitive data (username, password, and so on), run the New-AzDataFactoryV2LinkedServiceEncryptedCredential cmdlet.
This encryption ensures that the credentials are encrypted using Data Protection Application Programming Interface (DPAPI). The encrypted credentials are stored locally on the self-hosted integration runtime node (local machine). The output payload can be redirected to another JSON file (in this case, encryptedLinkedService.json) that contains encrypted credentials.
In this step, you create input and output datasets. They represent input and output data for the copy operation, which copies data from the SQL Server database to Azure Blob storage.
Create a dataset for the source SQL Server database
In this step, you define a dataset that represents data in the SQL Server database instance. The dataset is of type SqlServerTable. It refers to the SQL Server linked service that you created in the preceding step. The linked service has the connection information that the Data Factory service uses to connect to your SQL Server instance at runtime. This dataset specifies the SQL table in the database that contains the data. In this tutorial, the emp table contains the source data.
Create a JSON file named SqlServerDataset.json in the C:\ADFv2Tutorial folder, with the following code:
In this step, you define a dataset that represents data that will be copied to Azure Blob storage. The dataset is of the type AzureBlob. It refers to the Azure Storage linked service that you created earlier in this tutorial.
The linked service has the connection information that the data factory uses at runtime to connect to your Azure storage account. This dataset specifies the folder in the Azure storage to which the data is copied from the SQL Server database. In this tutorial, the folder is adftutorial/fromonprem, where adftutorial is the blob container and fromonprem is the folder.
Create a JSON file named AzureBlobDataset.json in the C:\ADFv2Tutorial folder, with the following code:
In this tutorial, you create a pipeline with a copy activity. The copy activity uses SqlServerDataset as the input dataset and AzureBlobDataset as the output dataset. The source type is set to SqlSource and the sink type is set to BlobSink.
Create a JSON file named SqlServerToBlobPipeline.json in the C:\ADFv2Tutorial folder, with the following code:
The pipeline automatically creates the output folder named fromonprem in the adftutorial blob container. Confirm that you see the dbo.emp.txt file in the output folder.
In the Azure portal, in the adftutorial container window, select Refresh to see the output folder.
Select fromonprem in the list of folders.
Confirm that you see a file named dbo.emp.txt.
Related content
The pipeline in this sample copies data from one location to another in Azure Blob storage. You learned how to:
Create a data factory.
Create a self-hosted integration runtime.
Create SQL Server and Azure Storage linked services.
Create SQL Server and Azure Blob datasets.
Create a pipeline with a copy activity to move the data.
Start a pipeline run.
Monitor the pipeline run.
For a list of data stores that are supported by Data Factory, see supported data stores.
To learn about copying data in bulk from a source to a destination, advance to the following tutorial:
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.