31. mar 23 - 2. apr 23
Najveći događaj učenja Fabric, Pover BI i SKL. 31. mart – 2. april. Koristite kod FABINSIDER da uštedite $400.
Registrujte se već danasOvaj pregledač više nije podržan.
Nadogradite na Microsoft Edge biste iskoristili najnovije funkcije, bezbednosne ispravke i tehničku podršku.
Azure Data Factory
Azure Synapse Analytics
Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!
Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores, process/transform the data by using compute services such as Azure HDInsight Hadoop, Spark, Azure Data Lake Analytics, and Azure Machine Learning, and publish output data to data stores such as Azure Synapse Analytics for business intelligence (BI) applications to consume.
This quickstart describes how to use REST API to create an Azure Data Factory. The pipeline in this data factory copies data from one location to another location in an Azure blob storage.
If you don't have an Azure subscription, create a free account before you begin.
We recommend that you use the Azure Az PowerShell module to interact with Azure. To get started, see Install Azure PowerShell. To learn how to migrate to the Az PowerShell module, see Migrate Azure PowerShell from AzureRM to Az.
For Sovereign clouds, you must use the appropriate cloud-specific endpoints for ActiveDirectoryAuthority and ResourceManagerUrl (BaseUri). You can use PowerShell to easily get the endpoint Urls for various clouds by executing “Get-AzEnvironment | Format-List”, which will return a list of endpoints for each cloud environment.
Launch PowerShell. Keep Azure PowerShell open until the end of this quickstart. If you close and reopen, you need to run the commands again.
Run the following command, and enter the user name and password that you use to sign in to the Azure portal:
Run the following command to view all the subscriptions for this account:
Run the following command to select the subscription that you want to work with. Replace SubscriptionId with the ID of your Azure subscription:
Select-AzSubscription -SubscriptionId "<SubscriptionId>"
Run the following commands after replacing the places-holders with your own values, to set global variables to be used in later steps.
$tenantID = "<your tenant ID>"
$appId = "<your application ID>"
$clientSecrets = "<your clientSecrets for the application>"
$subscriptionId = "<your subscription ID to create the factory>"
$resourceGroupName = "<your resource group to create the factory>"
$factoryName = "<specify the name of data factory to create. It must be globally unique.>"
$apiVersion = "2018-06-01"
Run the following commands to authenticate with Microsoft Entra ID:
$credentials = Get-Credential -UserName $appId
Connect-AzAccount -ServicePrincipal -Credential $credentials -Tenant $tenantID
You will be prompt to input the password, use the value in clientSecrets variable.
If you need to get the access token
Run the following commands to create a data factory:
$body = @"
"location": "East US",
"properties": {},
"identity": {
"type": "SystemAssigned"
$response = Invoke-AzRestMethod -SubscriptionId ${subscriptionId} -ResourceGroupName ${resourceGroupName} -ResourceProviderName Microsoft.DataFactory -ResourceType "factories" -Name ${factoryName} -ApiVersion ${apiVersion} -Method PUT -Payload ${body}
Note the following points:
The name of the Azure Data Factory must be globally unique. If you receive the following error, change the name and try again.
Data factory name "ADFv2QuickStartDataFactory" is not available.
For a list of Azure regions in which Data Factory is currently available, select the regions that interest you on the following page, and then expand Analytics to locate Data Factory: Products available by region. The data stores (Azure Storage, Azure SQL Database, etc.) and computes (HDInsight, etc.) used by data factory can be in other regions.
Here is the sample response content:
"principalId":"<service principal ID>",
"tenantId":"<tenant ID>"
"location":"East US",
You create linked services in a data factory to link your data stores and compute services to the data factory. In this quickstart, you only need create one Azure Storage linked service as both copy source and sink store, named "AzureStorageLinkedService" in the sample.
Run the following commands to create a linked service named AzureStorageLinkedService:
Replace <accountName> and <accountKey> with name and key of your Azure storage account before executing the commands.
$path = "/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/linkedservices/AzureStorageLinkedService?api-version=${apiVersion}"
$body = @"
$response = Invoke-AzRestMethod -Path ${path} -Method PUT -Payload $body
Here is the sample output:
You define a dataset that represents the data to copy from a source to a sink. In this example, you create two datasets: InputDataset and OutputDataset. They refer to the Azure Storage linked service that you created in the previous section. The input dataset represents the source data in the input folder. In the input dataset definition, you specify the blob container (adftutorial), the folder (input), and the file (emp.txt) that contain the source data. The output dataset represents the data that's copied to the destination. In the output dataset definition, you specify the blob container (adftutorial), the folder (output), and the file to which the data is copied.
Create InputDataset
$path = "/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/datasets/InputDataset?api-version=${apiVersion}"
$body = @"
$response = Invoke-AzRestMethod -Path ${path} -Method PUT -Payload $body
Here is the sample output:
"location":"@{type=AzureBlobStorageLocation; fileName=emp.txt; folderPath=input; container=adftutorial}"
Create OutputDataset
$path = "/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/datasets/OutputDataset?api-version=${apiVersion}"
$body = @"
$response = Invoke-AzRestMethod -Path ${path} -Method PUT -Payload $body
Here is the sample output:
"location":"@{type=AzureBlobStorageLocation; folderPath=output; container=adftutorial}"
In this example, this pipeline contains one Copy activity. The Copy activity refers to the "InputDataset" and the "OutputDataset" created in the previous step as input and output.
$path = "/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/pipelines/Adfv2QuickStartPipeline?api-version=${apiVersion}"
$body = @"
"name": "Adfv2QuickStartPipeline",
"properties": {
"activities": [
"name": "CopyFromBlobToBlob",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
"userProperties": [],
"typeProperties": {
"source": {
"type": "BinarySource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true
"sink": {
"type": "BinarySink",
"storeSettings": {
"type": "AzureBlobStorageWriteSettings"
"enableStaging": false
"inputs": [
"referenceName": "InputDataset",
"type": "DatasetReference"
"outputs": [
"referenceName": "OutputDataset",
"type": "DatasetReference"
"annotations": []
$response = Invoke-AzRestMethod -Path ${path} -Method PUT -Payload $body
Here is the sample output:
"@{name=CopyFromBlobToBlob; type=Copy; dependsOn=System.Object[]; policy=; userProperties=System.Object[]; typeProperties=; inputs=System.Object[]; outputs=System.Object[]}"
In this step, you trigger a pipeline run. The pipeline run ID returned in the response body is used in later monitoring API.
$path = "/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/pipelines/Adfv2QuickStartPipeline/createRun?api-version=${apiVersion}"
$response = Invoke-AzRestMethod -Path ${path} -Method POST
Here is the sample output:
You can also get the runId by using following command
($response.content | ConvertFrom-Json).runId
You can create pipeline with parameters. In the following example, we will create an input dataset and an output dataset that can take input and output filenames as parameters given to the pipeline.
Define a parameter called strInputFileName , and use it as file name for dataset.
$path = "/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/datasets/ParamInputDataset?api-version=${apiVersion}"
$body = @"
"name": "ParamInputDataset",
"properties": {
"linkedServiceName": {
"referenceName": "AzureStorageLinkedService",
"type": "LinkedServiceReference"
"parameters": {
"strInputFileName": {
"type": "string"
"annotations": [],
"type": "Binary",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": {
"value": "@dataset().strInputFileName",
"type": "Expression"
"folderPath": "input",
"container": "adftutorial"
"type": "Microsoft.DataFactory/factories/datasets"
$response = Invoke-AzRestMethod -Path ${path} -Method PUT -Payload $body
Here is the sample output:
"id": "/subscriptions/<subscriptionId>/resourceGroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<factoryName>/datasets/ParamInputDataset",
"name": "ParamInputDataset",
"type": "Microsoft.DataFactory/factories/datasets",
"properties": {
"linkedServiceName": {
"referenceName": "AzureStorageLinkedService",
"type": "LinkedServiceReference"
"parameters": {
"strInputFileName": {
"type": "string"
"annotations": [],
"type": "Binary",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": {
"value": "@dataset().strInputFileName",
"type": "Expression"
"folderPath": "input",
"container": "adftutorial"
"etag": "00000000-0000-0000-0000-000000000000"
Define a parameter called strOutputFileName , and use it as file name for dataset.
$path = "/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/datasets/ParamOutputDataset?api-version=${apiVersion}"
$body = @"
"name": "ParamOutputDataset",
"properties": {
"linkedServiceName": {
"referenceName": "AzureStorageLinkedService",
"type": "LinkedServiceReference"
"parameters": {
"strOutPutFileName": {
"type": "string"
"annotations": [],
"type": "Binary",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": {
"value": "@dataset().strOutPutFileName",
"type": "Expression"
"folderPath": "output",
"container": "adftutorial"
"type": "Microsoft.DataFactory/factories/datasets"
$response = Invoke-AzRestMethod -Path ${path} -Method PUT -Payload $body
Here is the sample output:
"id": "/subscriptions/<subscriptionId>/resourceGroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<factoryName>/datasets/ParamOutputDataset",
"name": "ParamOutputDataset",
"type": "Microsoft.DataFactory/factories/datasets",
"properties": {
"linkedServiceName": {
"referenceName": "AzureStorageLinkedService",
"type": "LinkedServiceReference"
"parameters": {
"strOutPutFileName": {
"type": "string"
"annotations": [],
"type": "Binary",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": {
"value": "@dataset().strOutPutFileName",
"type": "Expression"
"folderPath": "output",
"container": "adftutorial"
"etag": "00000000-0000-0000-0000-000000000000"
Define a pipeline with two pipeline level parameters: strParamInputFileName and strParamOutputFileName. Then link these two parameters to the strInputFileName and strOutputFileName parameters of the datasets.
$path = "/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/pipelines/Adfv2QuickStartParamPipeline?api-version=${apiVersion}"
$body = @"
"name": "Adfv2QuickStartParamPipeline",
"properties": {
"activities": [
"name": "CopyFromBlobToBlob",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
"userProperties": [],
"typeProperties": {
"source": {
"type": "BinarySource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true
"sink": {
"type": "BinarySink",
"storeSettings": {
"type": "AzureBlobStorageWriteSettings"
"enableStaging": false
"inputs": [
"referenceName": "ParamInputDataset",
"type": "DatasetReference",
"parameters": {
"strInputFileName": {
"value": "@pipeline().parameters.strParamInputFileName",
"type": "Expression"
"outputs": [
"referenceName": "ParamOutputDataset",
"type": "DatasetReference",
"parameters": {
"strOutPutFileName": {
"value": "@pipeline().parameters.strParamOutputFileName",
"type": "Expression"
"parameters": {
"strParamInputFileName": {
"type": "String"
"strParamOutputFileName": {
"type": "String"
$response = Invoke-AzRestMethod -Path ${path} -Method PUT -Payload $body
Here is the sample output:
"id": "/subscriptions/<subscriptionId>/resourceGroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<factoryName>/pipelines/Adfv2QuickStartParamPipeline",
"name": "Adfv2QuickStartParamPipeline",
"type": "Microsoft.DataFactory/factories/pipelines",
"properties": {
"activities": [
"name": "CopyFromBlobToBlob",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
"userProperties": [],
"typeProperties": {
"source": {
"type": "BinarySource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true
"sink": {
"type": "BinarySink",
"storeSettings": {
"type": "AzureBlobStorageWriteSettings"
"enableStaging": false
"inputs": [
"referenceName": "ParamInputDataset",
"type": "DatasetReference",
"parameters": {
"strInputFileName": {
"value": "@pipeline().parameters.strParamInputFileName",
"type": "Expression"
"outputs": [
"referenceName": "ParamOutputDataset",
"type": "DatasetReference",
"parameters": {
"strOutPutFileName": {
"value": "@pipeline().parameters.strParamOutputFileName",
"type": "Expression"
"parameters": {
"strParamInputFileName": {
"type": "String"
"strParamOutputFileName": {
"type": "String"
"etag": "5e01918d-0000-0100-0000-60d569a90000"
You can now specify values of the parameter at the time of creating the pipeline run.
$path = "/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/pipelines/Adfv2QuickStartParamPipeline/createRun?api-version=${apiVersion}"
$body = @"
"strParamInputFileName": "emp2.txt",
"strParamOutputFileName": "aloha.txt"
$response = Invoke-AzRestMethod -Path ${path} -Method POST -Payload $body
$runId = ($response.content | ConvertFrom-Json).runId
Here is the sample output:
Run the following script to continuously check the pipeline run status until it finishes copying the data.
$path = "/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/pipelineruns/${runId}?api-version=${apiVersion}"
while ($True) {
$response = Invoke-AzRestMethod -Path ${path} -Method GET
$response = $response.content | ConvertFrom-Json
Write-Host "Pipeline run status: " $response.Status -foregroundcolor "Yellow"
if ( ($response.Status -eq "InProgress") -or ($response.Status -eq "Queued") -or ($response.Status -eq "In Progress") ) {
Start-Sleep -Seconds 10
else {
$response | ConvertTo-Json
Here is the sample output:
"id": "/subscriptions/<subscriptionId>/resourceGroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<factoryName>/pipelineruns/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e",
"runId": "aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e",
"debugRunId": null,
"runGroupId": "aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e",
"pipelineName": "Adfv2QuickStartParamPipeline",
"parameters": {
"strParamInputFileName": "emp2.txt",
"strParamOutputFileName": "aloha.txt"
"invokedBy": {
"id": "9c0275ed99994c18932317a325276544",
"name": "Manual",
"invokedByType": "Manual"
"runStart": "2021-06-25T05:34:06.8424413Z",
"runEnd": "2021-06-25T05:34:13.2936585Z",
"durationInMs": 6451,
"status": "Succeeded",
"message": "",
"lastUpdated": "2021-06-25T05:34:13.2936585Z",
"annotations": [],
"runDimension": {},
"isLatest": true
Run the following script to retrieve copy activity run details, for example, size of the data read/written.
$path = "/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/pipelineruns/${runId}/queryActivityruns?api-version=${apiVersion}"
while ($True) {
$response = Invoke-AzRestMethod -Path ${path} -Method POST
$responseContent = $response.content | ConvertFrom-Json
$responseContentValue = $responseContent.value
Write-Host "Activity run status: " $responseContentValue.Status -foregroundcolor "Yellow"
if ( ($responseContentValue.Status -eq "InProgress") -or ($responseContentValue.Status -eq "Queued") -or ($responseContentValue.Status -eq "In Progress") ) {
Start-Sleep -Seconds 10
else {
$responseContentValue | ConvertTo-Json
Here is the sample output:
"activityRunEnd": "2021-06-25T05:34:11.9536764Z",
"activityName": "CopyFromBlobToBlob",
"activityRunStart": "2021-06-25T05:34:07.5161151Z",
"activityType": "Copy",
"durationInMs": 4437,
"retryAttempt": null,
"error": {
"errorCode": "",
"message": "",
"failureType": "",
"target": "CopyFromBlobToBlob",
"details": ""
"activityRunId": "bbbb1b1b-cc2c-dd3d-ee4e-ffffff5f5f5f",
"iterationHash": "",
"input": {
"source": {
"type": "BinarySource",
"storeSettings": "@{type=AzureBlobStorageReadSettings; recursive=True}"
"sink": {
"type": "BinarySink",
"storeSettings": "@{type=AzureBlobStorageWriteSettings}"
"enableStaging": false
"linkedServiceName": "",
"output": {
"dataRead": 134,
"dataWritten": 134,
"filesRead": 1,
"filesWritten": 1,
"sourcePeakConnections": 1,
"sinkPeakConnections": 1,
"copyDuration": 3,
"throughput": 0.044,
"errors": [],
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (East US)",
"usedDataIntegrationUnits": 4,
"billingReference": {
"activityType": "DataMovement",
"billableDuration": ""
"usedParallelCopies": 1,
"executionDetails": [
"@{source=; sink=; status=Succeeded; start=06/25/2021 05:34:07; duration=3; usedDataIntegrationUnits=4; usedParallelCopies=1; profile=; detailedDurations=}"
"dataConsistencyVerification": {
"VerificationResult": "NotVerified"
"durationInQueue": {
"integrationRuntimeQueue": 0
"userProperties": {},
"pipelineName": "Adfv2QuickStartParamPipeline",
"pipelineRunId": "aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e",
"status": "Succeeded",
"recoveryStatus": "None",
"integrationRuntimeNames": [
"executionDetails": {
"integrationRuntime": [
"@{name=DefaultIntegrationRuntime; type=Managed; location=East US; nodes=}"
"id": "/subscriptions/<subscriptionId>/resourceGroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<factoryName>/pipelineruns/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/activityruns/bbbb1b1b-cc2c-dd3d-ee4e-ffffff5f5f5f"
Use Azure Storage explorer to check the file is copied to "outputPath" from "inputPath" as you specified when creating a pipeline run.
You can clean up the resources that you created in the Quickstart in two ways. You can delete the Azure resource group, which includes all the resources in the resource group. If you want to keep the other resources intact, delete only the data factory you created in this tutorial.
Run the following command to delete the entire resource group:
Remove-AzResourceGroup -ResourceGroupName $resourcegroupname
Run the following command to delete only the data factory:
Remove-AzDataFactoryV2 -Name "<NameOfYourDataFactory>" -ResourceGroupName "<NameOfResourceGroup>"
The pipeline in this sample copies data from one location to another location in an Azure blob storage. Go through the tutorials to learn about using Data Factory in more scenarios.
31. mar 23 - 2. apr 23
Najveći događaj učenja Fabric, Pover BI i SKL. 31. mart – 2. april. Koristite kod FABINSIDER da uštedite $400.
Registrujte se već danasObuka
Orchestrate processes and data movement with Microsoft Fabric - Training
Use Data Factory pipelines in Microsoft Fabric
Microsoft Certified: Azure Data Engineer Associate - Certifications
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.
Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse
Learn how to use Copy Activity to copy data and use Data Flow to transform data from a cloud or on-premises REST source to supported sink data stores, or from supported source data store to a REST sink in Azure Data Factory or Azure Synapse Analytics pipelines.
Copy data from an HTTP source - Azure Data Factory & Azure Synapse
Learn how to copy data from a cloud or on-premises HTTP source to supported sink data stores by using a copy activity in an Azure Data Factory or Azure Synapse Analytics pipeline.
Quickstart: Create an Azure Data Factory using Python - Azure Data Factory
Use a data factory to copy data from one location in Azure Blob storage to another location.