Move data to or from Azure Blob Storage using SSIS connectors
The Azure Feature Pack for Integration Services (SSIS) provides components to connect to Azure, transfer data between Azure and on-premises data sources, and process data stored in Azure.
This menu links to technologies you can use to move data to and from Azure Blob storage:
Once customers have moved on-premises data into the cloud, they can access their data from any Azure service to leverage the full power of the suite of Azure technologies. The data may be subsequently used, for example, in Azure Machine Learning or on an HDInsight cluster.
Examples for using these Azure resources are in the SQL and HDInsight walkthroughs.
For a discussion of canonical scenarios that use SSIS to accomplish business needs common in hybrid data integration scenarios, see Doing more with SQL Server Integration Services Feature Pack for Azure blog.
For a complete introduction to Azure blob storage, refer to Azure Blob Basics and to Azure Blob Service REST API.
To perform the tasks described in this article, you must have an Azure subscription and an Azure Storage account set up. You need the Azure Storage account name and account key to upload or download data.
- To set up an Azure subscription, see Free one-month trial.
- For instructions on creating a storage account and for getting account and key information, see About Azure Storage accounts.
To use the SSIS connectors, you must download:
- SQL Server 2014 or 2016 Standard (or above): Install includes SQL Server Integration Services.
- Microsoft SQL Server 2014 or 2016 Integration Services Feature Pack for Azure: These connectors can be downloaded, respectively, from the SQL Server 2014 Integration Services and SQL Server 2016 Integration Services pages.
SSIS is installed with SQL Server, but is not included in the Express version. For information on what applications are included in various editions of SQL Server, see SQL Server Technical Documentation
For installing SSIS, see Install Integration Services (SSIS)
For information on how to get up-and-running using SISS to build simple extraction, transformation, and load (ETL) packages, see SSIS Tutorial: Creating a Simple ETL Package.
Download NYC Taxi dataset
The example described here use a publicly available dataset, either available through Azure Open Datasets or from the source TLC Trip Record Data. The dataset consists of about 173 million taxi rides in NYC in the year 2013. There are two types of data: trip details data and fare data.
Upload data to Azure blob storage
To move data using the SSIS feature pack from on-premises to Azure blob storage, we use an instance of the Azure Blob Upload Task, shown here:
The parameters that the task uses are described here:
|AzureStorageConnection||Specifies an existing Azure Storage Connection Manager or creates a new one that refers to an Azure Storage account that points to where the blob files are hosted.|
|BlobContainer||Specifies the name of the blob container that holds the uploaded files as blobs.|
|BlobDirectory||Specifies the blob directory where the uploaded file is stored as a block blob. The blob directory is a virtual hierarchical structure. If the blob already exists, it is replaced.|
|LocalDirectory||Specifies the local directory that contains the files to be uploaded.|
|FileName||Specifies a name filter to select files with the specified name pattern. For example, MySheet*.xls* includes files such as MySheet001.xls and MySheetABC.xlsx|
|TimeRangeFrom/TimeRangeTo||Specifies a time range filter. Files modified after TimeRangeFrom and before TimeRangeTo are included.|
The AzureStorageConnection credentials need to be correct and the BlobContainer must exist before the transfer is attempted.
Download data from Azure blob storage
To download data from Azure blob storage to on-premises storage with SSIS, use an instance of the Azure Blob Download Task.
More advanced SSIS-Azure scenarios
The SSIS feature pack allows for more complex flows to be handled by packaging tasks together. For example, the blob data could feed directly into an HDInsight cluster, whose output could be downloaded back to a blob and then to on-premises storage. SSIS can run Hive and Pig jobs on an HDInsight cluster using additional SSIS connectors:
- To run a Hive script on an Azure HDInsight cluster with SSIS, use Azure HDInsight Hive Task.
- To run a Pig script on an Azure HDInsight cluster with SSIS, use Azure HDInsight Pig Task.
This article is maintained by Microsoft. It was originally written by the following contributors.
- Mark Tabladillo | Senior Cloud Solution Architect
To see non-public LinkedIn profiles, sign in to LinkedIn.
- Introduction to Azure Blob Storage
- Copy and move blobs from one container or storage account to another
- Execute existing SSIS packages in Azure Data Factory or Azure Synapse Pipeline
- What is the Team Data Science Process (TDSP)?
Submit and view feedback for