Create a V2 data factory (Spark)
This template creates a data factory of version 2 with a pipeline that copies data from one folder to another in an Azure Blob Storage.
Here are a few important points about the template:
- The prerequisites for this template are mentioned in the Quickstart: Create a data factory by using Azure PowerShell article.
- Note that currently data factories of version 2 can only be created in East US and East US 2 regions.
When you deploy this Azure Resource Manager template, a data factory of version 2 is created with the following entities:
- Azure Storage linked service
- Azure Blob datasets (input and output)
- Pipeline with a copy activity
To get the name of the data factory
- Click the Deployment succeeded message.
- Click Go to resource group.
- Search for ADFTutorialResourceGroup0927<unique string>
The following sections provide steps for running and monitoring the pipeline. For more information, see Quickstart: Create a data factory by using Azure PowerShell.
Run and monitor the pipeline
After you deploy the template, to run and monitor the pipeline, do the following steps:
Download runmonitor.ps1 to a folder on your machine.
Launch Azure PowerShell.
Run the following command to log in to Azure.
Login-AzureRmAccount
Switch to the folder where you copied the script file.
Run the following command to log in to Azure after specifying the names of your Azure resource group and the data factory.
.\runmonitor.ps1 -resourceGroupName "<name of your resource group>" -DataFactoryName "<name of your data factory>"
Tags: Microsoft.DataFactory/factories, linkedservices, AzureStorage, SecureString, HDInsightOnDemand, LinkedServiceReference, pipelines, HDInsightSpark, Microsoft.Storage/storageAccounts