Редактиране

Споделяне чрез


Quickstart: Create an Azure Data Factory using ARM template

APPLIES TO: Azure Data Factory Azure Synapse Analytics

Tip

Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!

This quickstart describes how to use an Azure Resource Manager template (ARM template) to create an Azure data factory. The pipeline you create in this data factory copies data from one folder to another folder in an Azure blob storage. For a tutorial on how to transform data using Azure Data Factory, see Tutorial: Transform data using Spark.

An Azure Resource Manager template is a JavaScript Object Notation (JSON) file that defines the infrastructure and configuration for your project. The template uses declarative syntax. You describe your intended deployment without writing the sequence of programming commands to create the deployment.

Note

This article does not provide a detailed introduction of the Data Factory service. For an introduction to the Azure Data Factory service, see Introduction to Azure Data Factory.

If your environment meets the prerequisites and you're familiar with using ARM templates, select the Deploy to Azure button. The template will open in the Azure portal.

Button to deploy the Resource Manager template to Azure.

Prerequisites

Azure subscription

If you don't have an Azure subscription, create a free account before you begin.

Create a file

Open a text editor such as Notepad, and create a file named emp.txt with the following content:

John, Doe
Jane, Doe

Save the file in the C:\ADFv2QuickStartPSH folder. (If the folder doesn't already exist, create it.)

Review template

The template used in this quickstart is from Azure Quickstart Templates.

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "metadata": {
    "_generator": {
      "name": "bicep",
      "version": "0.26.54.24096",
      "templateHash": "17339534711754973978"
    }
  },
  "parameters": {
    "dataFactoryName": {
      "type": "string",
      "defaultValue": "[format('datafactory{0}', uniqueString(resourceGroup().id))]",
      "metadata": {
        "description": "Data Factory Name"
      }
    },
    "location": {
      "type": "string",
      "defaultValue": "[resourceGroup().location]",
      "metadata": {
        "description": "Location of the data factory."
      }
    },
    "storageAccountName": {
      "type": "string",
      "defaultValue": "[format('storage{0}', uniqueString(resourceGroup().id))]",
      "metadata": {
        "description": "Name of the Azure storage account that contains the input/output data."
      }
    },
    "blobContainerName": {
      "type": "string",
      "defaultValue": "[format('blob{0}', uniqueString(resourceGroup().id))]",
      "metadata": {
        "description": "Name of the blob container in the Azure Storage account."
      }
    }
  },
  "variables": {
    "dataFactoryLinkedServiceName": "ArmtemplateStorageLinkedService",
    "dataFactoryDataSetInName": "ArmtemplateTestDatasetIn",
    "dataFactoryDataSetOutName": "ArmtemplateTestDatasetOut",
    "pipelineName": "ArmtemplateSampleCopyPipeline"
  },
  "resources": [
    {
      "type": "Microsoft.Storage/storageAccounts/blobServices",
      "apiVersion": "2023-01-01",
      "name": "[format('{0}/{1}', parameters('storageAccountName'), 'default')]",
      "dependsOn": [
        "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName'))]"
      ]
    },
    {
      "type": "Microsoft.Storage/storageAccounts",
      "apiVersion": "2023-01-01",
      "name": "[parameters('storageAccountName')]",
      "location": "[parameters('location')]",
      "sku": {
        "name": "Standard_LRS"
      },
      "kind": "StorageV2",
      "properties": {
        "minimumTlsVersion": "TLS1_2",
        "supportsHttpsTrafficOnly": true,
        "allowBlobPublicAccess": false
      }
    },
    {
      "type": "Microsoft.Storage/storageAccounts/blobServices/containers",
      "apiVersion": "2023-01-01",
      "name": "[format('{0}/{1}/{2}', parameters('storageAccountName'), 'default', parameters('blobContainerName'))]",
      "dependsOn": [
        "[resourceId('Microsoft.Storage/storageAccounts/blobServices', parameters('storageAccountName'), 'default')]"
      ]
    },
    {
      "type": "Microsoft.DataFactory/factories",
      "apiVersion": "2018-06-01",
      "name": "[parameters('dataFactoryName')]",
      "location": "[parameters('location')]",
      "identity": {
        "type": "SystemAssigned"
      }
    },
    {
      "type": "Microsoft.DataFactory/factories/linkedservices",
      "apiVersion": "2018-06-01",
      "name": "[format('{0}/{1}', parameters('dataFactoryName'), variables('dataFactoryLinkedServiceName'))]",
      "properties": {
        "type": "AzureBlobStorage",
        "typeProperties": {
          "connectionString": "[format('DefaultEndpointsProtocol=https;AccountName={0};AccountKey={1}', parameters('storageAccountName'), listKeys(resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName')), '2023-01-01').keys[0].value)]"
        }
      },
      "dependsOn": [
        "[resourceId('Microsoft.DataFactory/factories', parameters('dataFactoryName'))]",
        "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName'))]"
      ]
    },
    {
      "type": "Microsoft.DataFactory/factories/datasets",
      "apiVersion": "2018-06-01",
      "name": "[format('{0}/{1}', parameters('dataFactoryName'), variables('dataFactoryDataSetInName'))]",
      "properties": {
        "linkedServiceName": {
          "referenceName": "[variables('dataFactoryLinkedServiceName')]",
          "type": "LinkedServiceReference"
        },
        "type": "Binary",
        "typeProperties": {
          "location": {
            "type": "AzureBlobStorageLocation",
            "container": "[parameters('blobContainerName')]",
            "folderPath": "input",
            "fileName": "emp.txt"
          }
        }
      },
      "dependsOn": [
        "[resourceId('Microsoft.DataFactory/factories', parameters('dataFactoryName'))]",
        "[resourceId('Microsoft.DataFactory/factories/linkedservices', parameters('dataFactoryName'), variables('dataFactoryLinkedServiceName'))]"
      ]
    },
    {
      "type": "Microsoft.DataFactory/factories/datasets",
      "apiVersion": "2018-06-01",
      "name": "[format('{0}/{1}', parameters('dataFactoryName'), variables('dataFactoryDataSetOutName'))]",
      "properties": {
        "linkedServiceName": {
          "referenceName": "[variables('dataFactoryLinkedServiceName')]",
          "type": "LinkedServiceReference"
        },
        "type": "Binary",
        "typeProperties": {
          "location": {
            "type": "AzureBlobStorageLocation",
            "container": "[parameters('blobContainerName')]",
            "folderPath": "output"
          }
        }
      },
      "dependsOn": [
        "[resourceId('Microsoft.DataFactory/factories', parameters('dataFactoryName'))]",
        "[resourceId('Microsoft.DataFactory/factories/linkedservices', parameters('dataFactoryName'), variables('dataFactoryLinkedServiceName'))]"
      ]
    },
    {
      "type": "Microsoft.DataFactory/factories/pipelines",
      "apiVersion": "2018-06-01",
      "name": "[format('{0}/{1}', parameters('dataFactoryName'), variables('pipelineName'))]",
      "properties": {
        "activities": [
          {
            "name": "MyCopyActivity",
            "type": "Copy",
            "typeProperties": {
              "source": {
                "type": "BinarySource",
                "storeSettings": {
                  "type": "AzureBlobStorageReadSettings",
                  "recursive": true
                }
              },
              "sink": {
                "type": "BinarySink",
                "storeSettings": {
                  "type": "AzureBlobStorageWriteSettings"
                }
              },
              "enableStaging": false
            },
            "inputs": [
              {
                "referenceName": "[variables('dataFactoryDataSetInName')]",
                "type": "DatasetReference"
              }
            ],
            "outputs": [
              {
                "referenceName": "[variables('dataFactoryDataSetOutName')]",
                "type": "DatasetReference"
              }
            ]
          }
        ]
      },
      "dependsOn": [
        "[resourceId('Microsoft.DataFactory/factories', parameters('dataFactoryName'))]",
        "[resourceId('Microsoft.DataFactory/factories/datasets', parameters('dataFactoryName'), variables('dataFactoryDataSetInName'))]",
        "[resourceId('Microsoft.DataFactory/factories/datasets', parameters('dataFactoryName'), variables('dataFactoryDataSetOutName'))]"
      ]
    }
  ],
  "outputs": {
    "name": {
      "type": "string",
      "value": "[variables('pipelineName')]"
    },
    "resourceId": {
      "type": "string",
      "value": "[resourceId('Microsoft.DataFactory/factories/pipelines', parameters('dataFactoryName'), variables('pipelineName'))]"
    },
    "resourceGroupName": {
      "type": "string",
      "value": "[resourceGroup().name]"
    },
    "location": {
      "type": "string",
      "value": "[parameters('location')]"
    }
  }
}

There are Azure resources defined in the template:

More Azure Data Factory template samples can be found in the quickstart template gallery.

Deploy the template

  1. Select the following image to sign in to Azure and open a template. The template creates an Azure Data Factory account, a storage account, and a blob container.

    Button to deploy the Resource Manager template to Azure.

  2. Select or enter the following values.

    Deploy ADF ARM template

    Unless it's specified, use the default values to create the Azure Data Factory resources:

    • Subscription: Select an Azure subscription.
    • Resource group: Select Create new, enter a unique name for the resource group, and then select OK.
    • Region: Select a location. For example, East US.
    • Data Factory Name: Use default value.
    • Location: Use default value.
    • Storage Account Name: Use default value.
    • Blob Container: Use default value.

Review deployed resources

  1. Select Go to resource group.

    Resource Group

  2. Verify your Azure Data Factory is created.

    1. Your Azure Data Factory name is in the format - datafactory<uniqueid>.

    Sample Data Factory

  3. Verify your storage account is created.

    1. The storage account name is in the format - storage<uniqueid>.

    Storage Account

  4. Select the storage account created and then select Containers.

    1. On the Containers page, select the blob container you created.
      1. The blob container name is in the format - blob<uniqueid>.

    Blob container

Upload a file

  1. On the Containers page, select Upload.

  2. In te right pane, select the Files box, and then browse to and select the emp.txt file that you created earlier.

  3. Expand the Advanced heading.

  4. In the Upload to folder box, enter input.

  5. Select the Upload button. You should see the emp.txt file and the status of the upload in the list.

  6. Select the Close icon (an X) to close the Upload blob page.

    Upload file to input folder

Keep the container page open, because you can use it to verify the output at the end of this quickstart.

Start Trigger

  1. Navigate to the Data factories page, and select the data factory you created.

  2. Select Open on the Open Azure Data Factory Studio tile.

    Author & Monitor

  3. Select the Author tab .

  4. Select the pipeline created - ArmtemplateSampleCopyPipeline.

    ARM template pipeline

  5. Select Add Trigger > Trigger Now.

    Trigger

  6. In the right pane under Pipeline run, select OK.

Monitor the pipeline

  1. Select the Monitor tab .

  2. You see the activity runs associated with the pipeline run. In this quickstart, the pipeline has only one activity of type: Copy. As such, you see a run for that activity.

    Successful run

Verify the output file

The pipeline automatically creates an output folder in the blob container. Then, it copies the emp.txt file from the input folder to the output folder.

  1. In the Azure portal, on the Containers page, select Refresh to see the output folder.

  2. Select output in the folder list.

  3. Confirm that the emp.txt is copied to the output folder.

    Output

Clean up resources

You can clean up the resources that you created in the Quickstart in two ways. You can delete the Azure resource group, which includes all the resources in the resource group. If you want to keep the other resources intact, delete only the data factory you created in this tutorial.

Deleting a resource group deletes all resources including data factories in it. Run the following command to delete the entire resource group:

Remove-AzResourceGroup -ResourceGroupName $resourcegroupname

If you want to delete just the data factory, and not the entire resource group, run the following command:

Remove-AzDataFactoryV2 -Name $dataFactoryName -ResourceGroupName $resourceGroupName

In this quickstart, you created an Azure Data Factory using an ARM template and validated the deployment. To learn more about Azure Data Factory and Azure Resource Manager, continue on to the articles below.