Activity overview
This article helps you understand activities in Microsoft Fabric and use them to construct end-to-end data-driven workflows for your data movement and data processing scenarios.
Overview
A Microsoft Fabric Workspace can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. For example, a pipeline could contain a set of activities that ingest and clean log data, and then kick off a mapping data flow to analyze the log data. The pipeline allows you to manage the activities as a set instead of each one individually. You deploy and schedule the pipeline instead of the activities independently.
The activities in a pipeline define actions to perform on your data. For example, you can use a copy activity to copy data from SQL Server to an Azure Blob Storage. Then, use a Dataflow activity or a Notebook activity to process and transform data from the blob storage to an Azure Synapse Analytics pool on top of which business intelligence reporting solutions are built.
Microsoft Fabric has three types of activities: data movement activities, data transformation activities, and control activities.
Data movement activities
Copy activity in Microsoft Fabric copies data from a source data store to a sink data store. Fabric supports the data stores listed in the Connector overview article. Data from any source can be written to any sink.
For more information, see How to copy data using the copy activity.
Data transformation activities
Microsoft Fabric supports the following transformation activities that can be added either individually or chained with another activity.
For more information, see the data transformation activities article.
Data transformation activity | Compute environment |
---|---|
Copy data | Compute manager by Microsoft Fabric |
Dataflow Gen2 | Compute manager by Microsoft Fabric |
Delete data | Compute manager by Microsoft Fabric |
Fabric Notebook | Apache Spark clusters managed by Microsoft Fabric |
HDInsight activity | Apache Spark clusters managed by Microsoft Fabric |
Spark Job Definition | Apache Spark clusters managed by Microsoft Fabric |
Stored Procedure | Azure SQL, Azure Synapse Analytics, or SQL Server |
SQL script | Azure SQL, Azure Synapse Analytics, or SQL Server |
Control flow activities
The following control flow activities are supported:
Control activity | Description |
---|---|
Append variable | Add a value to an existing array variable. |
Azure Batch activity | Runs an Azure Batch script. |
Azure Databricks activity | Runs an Azure Databricks job (Notebook, Jar, Python). |
Azure Machine Learning activity | Runs an Azure Machine Learning job. |
Deactivate activity | Deactivates another activity. |
Fail | Cause pipeline execution to fail with a customized error message and error code. |
Filter | Apply a filter expression to an input array. |
ForEach | ForEach Activity defines a repeating control flow in your pipeline. This activity is used to iterate over a collection and executes specified activities in a loop. The loop implementation of this activity is similar to the Foreach looping structure in programming languages. |
Functions activity | Executes an Azure Function. |
Get metadata | GetMetadata activity can be used to retrieve metadata of any data in a Data Factory or Synapse pipeline. |
If condition | The If Condition can be used to branch based on condition that evaluates to true or false. The If Condition activity provides the same functionality that an if statement provides in programming languages. It evaluates a set of activities when the condition evaluates to true and another set of activities when the condition evaluates to false . |
Invoke pipeline | Execute Pipeline activity allows a Data Factory or Synapse pipeline to invoke another pipeline. |
KQL activity | Executes a KQL script against a Kusto instance. |
Lookup Activity | Lookup Activity can be used to read or look up a record/ table name/ value from any external source. This output can further be referenced by succeeding activities. |
Set Variable | Set the value of an existing variable. |
Switch activity | Implements a switch expression that allows multiple subsequent activities for each potential result of the expression. |
Teams activity | Posts a message in a Teams channel or group chat. |
Until activity | Implements Do-Until loop that is similar to Do-Until looping structure in programming languages. It executes a set of activities in a loop until the condition associated with the activity evaluates to true. You can specify a timeout value for the until activity. |
Wait activity | When you use a Wait activity in a pipeline, the pipeline waits for the specified time before continuing with execution of subsequent activities. |
Web activity | Web Activity can be used to call a custom REST endpoint from a pipeline. |
Webhook activity | Using the webhook activity, call an endpoint, and pass a callback URL. The pipeline run waits for the callback to be invoked before proceeding to the next activity. |
Adding activities to a pipeline with the Microsoft Fabric UI
Use these steps to add and configure activities in a Microsoft Fabric pipeline:
- Create a new pipeline in your workspace.
- On the Activities tab for the pipeline, browse the activities displayed, scrolling to the right if necessary to see all activities. Select an activity to add it to the pipeline editor.
- When you add an activity and select it in the pipeline editor canvas, its General settings will appear in the properties pane below the canvas.
- Each activity also contains custom properties specific to its configuration on other tabs in the properties pane.
General settings
When you add a new activity to a pipeline and select it, you'll see its properties panes in the area at the bottom of the screen. These properties panes include General, Settings, and sometimes other panes as well.
The general settings will always include Name and Description fields for every activity. Some activities also include the following:
Setting | Description |
---|---|
Timeout | The maximum amount of time an activity can run. The default is 12 hours, and the maximum amount of time allowed is seven days. The format for the timeout is in D.HH:MM:SS. |
Retry | Maximum number of retry attempts. |
(Advanced properties) Retry interval (sec) | The number of seconds between each retry attempt. |
(Advanced properties) Secure output | When checked, output from the activity isn't captured in logging. |
(Advanced properties) Secure input | When checked, input from the activity isn't captured in logging. |
Note
There is a default soft limit of maximum 80 activities per pipeline, which includes inner activities for containers.