Copy data and transform with dynamic parameters hourly

APPLIES TO: Azure Data Factory Azure Synapse Analytics

Tip

Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!

In this scenario, you want to copy data from AWS S3 to Azure Blob storage and transform with Azure Databricks (with dynamic parameters in the script) on an hourly schedule for 8 hours each day over 30 days.

The prices used in this example below are hypothetical and are not intended to imply exact actual pricing. Read/write and monitoring costs are not shown since they are typically negligible and will not impact overall costs significantly. Activity runs are also rounded to the nearest 1000 in pricing calculator estimates.

Refer to the Azure Pricing Calculator for more specific scenarios and to estimate your future costs to use the service.

Configuration

To accomplish the scenario, you need to create a pipeline with the following items:

  • One copy activity with an input dataset for the data to be copied from AWS S3, an output dataset for the data on Azure storage.
  • One Lookup activity for passing parameters dynamically to the transformation script.
  • One Azure Databricks activity for the data transformation.
  • One schedule trigger to execute the pipeline every hour for 8 hours per day. When you want to run a pipeline, you can either trigger it immediately or schedule it. In addition to the pipeline itself, each trigger instance counts as a single Activity run.

Diagram shows a pipeline with a schedule trigger. In the pipeline, copy activity flows to an input dataset, an output dataset, and lookup activity that flows to a DataBricks activity, which runs on Azure Databricks. The input dataset flows to an AWS S3 linked service. The output dataset flows to an Azure Storage linked service.

Costs estimation

Operations Types and Units
Run Pipeline 4 Activity runs per execution (1 for trigger run, 3 for activity runs) = 960 activity runs, rounded up since the calculator only allows increments of 1000.
Copy Data Assumption: DIU hours per execution = 10 min 10 min \ 60 min * 4 Azure Integration Runtime (default DIU setting = 4) For more information on data integration units and optimizing copy performance, see this article
Execute Lookup activity Assumption: pipeline activity hours per execution = 1 min 1 min / 60 min Pipeline Activity execution
Execute Databricks activity Assumption: external execution hours per execution = 10 min 10 min / 60 min External Pipeline Activity execution

Pricing example: Pricing calculator example

Total scenario pricing for 30 days: $41.03

Screenshot of the pricing calculator configured for a copy data and transform with dynamic parameters scenario.