Edit

Share via


Quickstart: Create an Azure Data Factory using Bicep

APPLIES TO: Azure Data Factory Azure Synapse Analytics

Tip

Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!

This quickstart describes how to use Bicep to create an Azure data factory. The pipeline you create in this data factory copies data from one folder to another folder in an Azure blob storage. For a tutorial on how to transform data using Azure Data Factory, see Tutorial: Transform data using Spark.

Bicep is a domain-specific language (DSL) that uses declarative syntax to deploy Azure resources. It provides concise syntax, reliable type safety, and support for code reuse. Bicep offers the best authoring experience for your infrastructure-as-code solutions in Azure.

Note

This article does not provide a detailed introduction of the Data Factory service. For an introduction to the Azure Data Factory service, see Introduction to Azure Data Factory.

Prerequisites

Azure subscription

If you don't have an Azure subscription, create a free account before you begin.

Review the Bicep file

The Bicep file used in this quickstart is from Azure Quickstart Templates.

@description('Data Factory Name')
param dataFactoryName string = 'datafactory${uniqueString(resourceGroup().id)}'

@description('Location of the data factory.')
param location string = resourceGroup().location

@description('Name of the Azure storage account that contains the input/output data.')
param storageAccountName string = 'storage${uniqueString(resourceGroup().id)}'

@description('Name of the blob container in the Azure Storage account.')
param blobContainerName string = 'blob${uniqueString(resourceGroup().id)}'

var dataFactoryLinkedServiceName = 'ArmtemplateStorageLinkedService'
var dataFactoryDataSetInName = 'ArmtemplateTestDatasetIn'
var dataFactoryDataSetOutName = 'ArmtemplateTestDatasetOut'
var pipelineName = 'ArmtemplateSampleCopyPipeline'

resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
  name: storageAccountName
  location: location
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'StorageV2'

  properties: {
    minimumTlsVersion: 'TLS1_2'
    supportsHttpsTrafficOnly: true
    allowBlobPublicAccess: false
  }

  resource defaultBlobService 'blobServices' = {
    name: 'default'
  }
}

resource blobContainer 'Microsoft.Storage/storageAccounts/blobServices/containers@2023-01-01' = {
  parent: storageAccount::defaultBlobService
  name: blobContainerName
}

resource dataFactory 'Microsoft.DataFactory/factories@2018-06-01' = {
  name: dataFactoryName
  location: location
  identity: {
    type: 'SystemAssigned'
  }
}

resource dataFactoryLinkedService 'Microsoft.DataFactory/factories/linkedservices@2018-06-01' = {
  parent: dataFactory
  name: dataFactoryLinkedServiceName
  properties: {
    type: 'AzureBlobStorage'
    typeProperties: {
      connectionString: 'DefaultEndpointsProtocol=https;AccountName=${storageAccount.name};AccountKey=${storageAccount.listKeys().keys[0].value}'
    }
  }
}

resource dataFactoryDataSetIn 'Microsoft.DataFactory/factories/datasets@2018-06-01' = {
  parent: dataFactory
  name: dataFactoryDataSetInName
  properties: {
    linkedServiceName: {
      referenceName: dataFactoryLinkedService.name
      type: 'LinkedServiceReference'
    }
    type: 'Binary'
    typeProperties: {
      location: {
        type: 'AzureBlobStorageLocation'
        container: blobContainerName
        folderPath: 'input'
        fileName: 'emp.txt'
      }
    }
  }
}

resource dataFactoryDataSetOut 'Microsoft.DataFactory/factories/datasets@2018-06-01' = {
  parent: dataFactory
  name: dataFactoryDataSetOutName
  properties: {
    linkedServiceName: {
      referenceName: dataFactoryLinkedService.name
      type: 'LinkedServiceReference'
    }
    type: 'Binary'
    typeProperties: {
      location: {
        type: 'AzureBlobStorageLocation'
        container: blobContainerName
        folderPath: 'output'
      }
    }
  }
}

resource dataFactoryPipeline 'Microsoft.DataFactory/factories/pipelines@2018-06-01' = {
  parent: dataFactory
  name: pipelineName
  properties: {
    activities: [
      {
        name: 'MyCopyActivity'
        type: 'Copy'
        typeProperties: {
          source: {
            type: 'BinarySource'
            storeSettings: {
              type: 'AzureBlobStorageReadSettings'
              recursive: true
            }
          }
          sink: {
            type: 'BinarySink'
            storeSettings: {
              type: 'AzureBlobStorageWriteSettings'
            }
          }
          enableStaging: false
        }
        inputs: [
          {
            referenceName: dataFactoryDataSetIn.name
            type: 'DatasetReference'
          }
        ]
        outputs: [
          {
            referenceName: dataFactoryDataSetOut.name
            type: 'DatasetReference'
          }
        ]
      }
    ]
  }
}

output name string = dataFactoryPipeline.name
output resourceId string = dataFactoryPipeline.id
output resourceGroupName string = resourceGroup().name
output location string = location

There are several Azure resources defined in the Bicep file:

Create a file

Open a text editor such as Notepad, and create a file named emp.txt with the following content:

John, Doe
Jane, Doe

Save the file locally. You'll use it later in the quickstart.

Deploy the Bicep file

  1. Save the Bicep file from Azure Quickstart Templates as main.bicep to your local computer.

  2. Deploy the Bicep file using either Azure CLI or Azure PowerShell.

    az group create --name exampleRG --location eastus
    az deployment group create --resource-group exampleRG --template-file main.bicep
    

    When the deployment finishes, you should see a message indicating the deployment succeeded.

Review deployed resources

Use the Azure CLI or Azure PowerShell to list the deployed resources in the resource group.

az resource list --resource-group exampleRG

You can also use the Azure portal to review the deployed resources.

  1. Sign in to the Azure portal.
  2. Navigate to your resource group.
  3. You'll see your resources listed. Select each resource to see an overview.

Upload a file

Use the Azure portal to upload the emp.txt file.

  1. Navigate to your resource group and select the storage account created. Then, select the Containers tab on the left panel.

    Containers tab

  2. On the Containers page, select the blob container created. The name is in the format - blob<uniqueid>.

    Blob container

  3. Select Upload, and then select the Files box icon in the right pane. Navigate to and select the emp.txt file that you created earlier.

  4. Expand the Advanced heading.

  5. In the Upload to folder box, enter input.

  6. Select the Upload button. You should see the emp.txt file and the status of the upload in the list.

  7. Select the Close icon (an X) to close the Upload blob page.

    Upload file to input folder

Keep the container page open because you can use it to verify the output at the end of this quickstart.

Start trigger

  1. Navigate to the resource group page, and select the data factory you created.

  2. Select Open on the Open Azure Data Factory Studio tile.

    Author & Monitor

  3. Select the Author tab .

  4. Select the pipeline created: ArmtemplateSampleCopyPipeline.

    Bicep pipeline

  5. Select Add Trigger > Trigger Now.

    Trigger

  6. In the right pane under Pipeline run, select OK.

Monitor the pipeline

  1. Select the Monitor tab.

  2. You see the activity runs associated with the pipeline run. In this quickstart, the pipeline only has one activity of type Copy. You should see a run for that activity.

    Successful run

Verify the output file

The pipeline automatically creates an output folder in the blob container. It then copies the emp.txt file from the input folder to the output folder.

  1. On the Containers page in the Azure portal, select Refresh to see the output folder.

  2. Select output in the folder list.

  3. Confirm that the emp.txt is copied to the output folder.

    Output

Clean up resources

When no longer needed, use the Azure CLI or Azure PowerShell to delete the resource group and all of its resources.

az group delete --name exampleRG

You can also use the Azure portal to delete the resource group.

  1. In the Azure portal, navigate to your resource group.
  2. Select Delete resource group.
  3. A tab will appear. Enter the resource group name and select Delete.

In this quickstart, you created an Azure Data Factory using Bicep and validated the deployment. To learn more about Azure Data Factory and Bicep, continue on to the articles below.