Transfer data to and from Azure
There are several options for transferring data to and from Azure, depending on your needs.
Using physical hardware to transfer data to Azure is a good option when:
- Your network is slow or unreliable.
- Getting more network bandwidth is cost-prohibitive.
- Security or organizational policies don't allow outbound connections when dealing with sensitive data.
If your primary concern is how long it takes to transfer your data, you might want to run a test to verify whether network transfer is slower than physical transport.
There are two main options for physically transporting data to Azure:
The Azure Import/Export service
The Azure Import/Export service lets you securely transfer large amounts of data to Azure Blob Storage or Azure Files by shipping internal SATA HDDs or SDDs to an Azure datacenter. You can also use this service to transfer data from Azure Storage to hard disk drives and have the drives shipped to you for loading on-premises.
Azure Data Box
Azure Data Box is a Microsoft-provided appliance that works much like the Import/Export service. With Data Box, Microsoft ships you a proprietary, secure, and tamper-resistant transfer appliance and handles the end-to-end logistics, which you can track through the portal. One benefit of the Data Box service is ease of use. You don't need to purchase several hard drives, prepare them, and transfer files to each one. Data Box is supported by many industry-leading Azure partners to make it easier to seamlessly use offline transport to the cloud from their products.
Command-line tools and APIs
Consider these options when you want scripted and programmatic data transfer:
The Azure CLI is a cross-platform tool that allows you to manage Azure services and upload data to Storage.
AzCopy. Use AzCopy from a Windows or Linux command line to easily copy data to and from Blob Storage, Azure File Storage, and Azure Table Storage with optimal performance. AzCopy supports concurrency and parallelism, and the ability to resume copy operations when interrupted. You can also use AzCopy to copy data from AWS to Azure. For programmatic access, the Microsoft Azure Storage Data Movement Library is the core framework that powers AzCopy. It's provided as a .NET Core library.
With PowerShell, the Start-AzureStorageBlobCopy PowerShell cmdlet is an option for Windows administrators who are used to PowerShell.
AdlCopy enables you to copy data from Blob Storage into Azure Data Lake Storage. It can also be used to copy data between two Data Lake Storage accounts. However, it can't be used to copy data from Data Lake Storage to Blob Storage.
Distcp is used to copy data to and from an HDInsight cluster storage (WASB) into a Data Lake Storage account.
Sqoop is an Apache project and part of the Hadoop ecosystem. It comes preinstalled on all HDInsight clusters. It allows data transfer between an HDInsight cluster and relational databases such as SQL, Oracle, MySQL, and so on. Sqoop is a collection of related tools, including import and export tools. Sqoop works with HDInsight clusters by using either Blob Storage or Data Lake Storage attached storage.
PolyBase is a technology that accesses data outside a database through the T-SQL language. In SQL Server 2016, it allows you to run queries on external data in Hadoop or to import or export data from Blob Storage. In Azure Synapse Analytics, you can import or export data from Blob Storage and Data Lake Storage. Currently, PolyBase is the fastest method of importing data into Azure Synapse Analytics.
Use the Hadoop command line when you have data that resides on an HDInsight cluster head node. You can use the
hadoop -copyFromLocalcommand to copy that data to your cluster's attached storage, such as Blob Storage or Data Lake Storage. In order to use the Hadoop command, you must first connect to the head node. Once connected, you can upload a file to storage.
Consider the following options if you're only transferring a few files or data objects and don't need to automate the process.
Azure Storage Explorer is a cross-platform tool that lets you manage the contents of your Azure storage accounts. It allows you to upload, download, and manage blobs, files, queues, tables, and Azure Cosmos DB entities. Use it with Blob Storage to manage blobs and folders, and upload and download blobs between your local file system and Blob Storage, or between storage accounts.
Azure portal. Both Blob Storage and Data Lake Storage provide a web-based interface for exploring files and uploading new files. This option is a good one if you don't want to install tools or issue commands to quickly explore your files, or if you want to upload a handful of new ones.
Data sync and pipelines
Azure Data Factory is a managed service best suited for regularly transferring files between many Azure services, on-premises systems, or a combination of the two. By using Data Factory, you can create and schedule data-driven workflows called pipelines that ingest data from disparate data stores. Data Factory can process and transform the data by using compute services such as Azure HDInsight Hadoop, Spark, Azure Data Lake Analytics, and Azure Machine Learning. You can create data-driven workflows for orchestrating and automating data movement and data transformation.
Pipelines and activities in Data Factory and Azure Synapse Analytics can be used to construct end-to-end data-driven workflows for your data movement and data processing scenarios. Additionally, the Azure Data Factory integration runtime is used to provide data integration capabilities across different network environments.
Azure Data Box Gateway transfers data to and from Azure, but it's a virtual appliance, not a hard drive. Virtual machines residing in your on-premises network write data to Data Box Gateway by using the NFS and SMB protocols. The device then transfers your data to Azure.
Key selection criteria
For data transfer scenarios, choose the appropriate system for your needs by answering these questions:
Do you need to transfer large amounts of data, where doing so over an internet connection would take too long, be unreliable, or too expensive? If yes, consider physical transfer.
Do you prefer to script your data transfer tasks, so they're reusable? If so, select one of the command-line options or Data Factory.
Do you need to transfer a large amount of data over a network connection? If so, select an option that's optimized for big data.
Do you need to transfer data to or from a relational database? If yes, choose an option that supports one or more relational databases. Some of these options also require a Hadoop cluster.
Do you need an automated data pipeline or workflow orchestration? If yes, consider Data Factory.
The following tables summarize the key differences in capabilities.
|Capability||The Import/Export service||Data Box|
|Form factor||Internal SATA HDDs or SDDs||Secure, tamper-proof, single hardware appliance|
|Microsoft manages shipping logistics||No||Yes|
|Integrates with partner products||No||Yes|
|Optimized for big data||Yes||Yes||Yes|
|Copy to relational database||No||Yes||No|
|Copy from relational database||No||Yes||No|
|Copy to Blob Storage||Yes||Yes||Yes|
|Copy from Blob Storage||Yes||Yes||No|
|Copy to Data Lake Storage||Yes||Yes||Yes|
|Copy from Data Lake Storage||Yes||Yes||No|
|Compatible platforms||Linux, OS X, Windows||Linux, Windows||Windows||Linux, OS X, Windows||SQL Server, Azure Synapse Analytics|
|Optimized for big data||No||Yes||No||Yes 1||Yes 2|
|Copy to relational database||No||No||No||No||Yes|
|Copy from relational database||No||No||No||No||Yes|
|Copy to Blob Storage||Yes||Yes||Yes||No||Yes|
|Copy from Blob Storage||Yes||Yes||Yes||Yes||Yes|
|Copy to Data Lake Storage||No||Yes||Yes||Yes||Yes|
|Copy from Data Lake Storage||No||No||Yes||Yes||Yes|
 AdlCopy is optimized for transferring big data when used with a Data Lake Analytics account.
 PolyBase performance can be increased by pushing computation to Hadoop and using PolyBase scale-out groups to enable parallel data transfer between SQL Server instances and Hadoop nodes.
Graphical interfaces, data sync, and data pipelines
|Capability||Azure Storage Explorer||Azure portal *||Data Factory||Data Box Gateway|
|Optimized for big data||No||No||Yes||Yes|
|Copy to relational database||No||No||Yes||No|
|Copy from relational database||No||No||Yes||No|
|Copy to Blob Storage||Yes||No||Yes||Yes|
|Copy from Blob Storage||Yes||No||Yes||No|
|Copy to Data Lake Storage||No||No||Yes||No|
|Copy from Data Lake Storage||No||No||Yes||No|
|Upload to Blob Storage||Yes||Yes||Yes||Yes|
|Upload to Data Lake Storage||Yes||Yes||Yes||Yes|
|Orchestrate data transfers||No||No||Yes||No|
|Custom data transformations||No||No||Yes||No|
|Pricing model||Free||Free||Pay per usage||Pay per unit|
* Azure portal in this case represents the web-based exploration tools for Blob Storage and Data Lake Storage.
This article is maintained by Microsoft. It was originally written by the following contributors.
- Zoiner Tejada | CEO and Architect
- What is Azure Import/Export service?
- What is Azure Data Box?
- What is the Azure CLI?
- Get started with AzCopy
- Get started with Storage Explorer
- What is Azure Data Factory?
- What is Azure Data Box Gateway?
Submit and view feedback for