Edit

Share via


Create a script activity in Microsoft Fabric Data Factory

In this article, you learn how to create a script activity in Microsoft Fabric Data Factory to run custom PostgreSQL queries. A script activity allows you to run various types of PostgreSQL commands directly in your pipelines. These commands include:

  • Data Manipulation Language (DML) statements: INSERT, UPDATE, DELETE, and SELECT.
  • Data Definition Language (DDL) statements: CREATE, ALTER, and DROP.

Prerequisites

Create a script activity

  1. In Microsoft Fabric, select your workspace, switch to Data factory, and then select the New item button.

  2. On the New item pane, search for pipeline and select the Data pipeline tile.

  3. In the New pipeline dialog, enter a name and then select the Create button to create a data pipeline.

  4. On the Activities menu, select the Script icon.

    Screenshot that shows the icon for selecting a script activity.

  5. With the script activity selected on the data pipeline canvas, on the General tab, enter a name for the activity.

    Screenshot that shows where to enter a name for a script activity on the General tab.

  6. On the Settings tab, select your Azure Database for PostgreSQL connection, or create a new one by using the More option. Learn more about connecting to your data by using the modern get-data experience for data pipelines.

    Screenshot that shows an example setting for a connection.

  7. Select either the Query or NonQuery option, depending on your script.

    Screenshot that highlights the query and nonquery options for a script.

    The script activity supports both query and nonquery statements.

    Query statements execute PostgreSQL statements (often SELECT statements) that return results. A query statement returns records of data.

    Screenshot that shows a sample of a query script.

    Here's a sample payload with a query statement:

    {
       "name": "Sample of select statement",
       "type": "Script",
       "dependsOn": [],
       "policy": {
          "timeout": "0.12:00:00",
          "retry": 0,
          "retryIntervalInSeconds": 30,
          "secureOutput": false,
          "secureInput": false
       },
       "typeProperties": {
          "scripts": [
                {
                   "type": "Query",
                   "text": {
                      "value": "SELECT *  FROM sample_table WHERE sample_int =100",
                      "type": "Expression"
                   }
                }
          ],
          "scriptBlockExecutionTimeout": "02:00:00"
       },
       "externalReferences": {
          "connection": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
       }
    }
    

Create multiple scripts inside one script activity

On the Settings tab, you can configure multiple queries in one script activity. To add a new script input, select the plus (+) button in the Script area.

Screenshot that shows an example of the button and box for creating a new script input.

You can select the + button multiple times, depending on how many script inputs you want to create. For example, to add two new script inputs, select the + button two times.

Screenshot that shows how to add a second box for script input.

If you want to delete a query input box, select the Delete icon for that box.

Screenshot that shows the Delete icon for script input boxes.

Here's a sample payload with two separate queries:

{
    "name": "Sample of multiple select statements",
    "type": "Script",
    "dependsOn": [],
    "policy": {
        "timeout": "0.12:00:00",
        "retry": 0,
        "retryIntervalInSeconds": 30,
        "secureOutput": false,
        "secureInput": false
    },
    "typeProperties": {
        "scripts": [
            {
                "type": "Query",
                "text": {
                    "value": "SELECT * FROM sample_table WHERE sample_int = 100;",
                    "type": "Expression"
                }
            },
            {
                "type": "Query",
                "text": {
                    "value": "SELECT * FROM sample_table WHERE sample_int > 250;",
                    "type": "Expression"
                }
            }
        ],
        "scriptBlockExecutionTimeout": "02:00:00"
    },
    "externalReferences": {
        "connection": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    }
}

Use script parameters

Important

Multiple-query statements that use output parameters aren't supported. You need to split any output queries into separate script blocks within a script activity.

A script activity supports two types of script parameters:

  • Named parameters are based on the name of the parameters and are specified as @<name> in the query.
  • Positional parameters are based on the index of the parameters and are specified in the query (in order) as $<position number> with a starting index of 1.

For named parameters as output parameters, use the @ prefix. Set the value as null with the Treat as null box checked on the UI, and leave the payload blank or null. The value in the text should be null.

Screenshot that shows an output parameter example with checkboxes selected for treating the values as null.

The name set within the procedure for output is the name used within the resultSets data output. The name set in the UI output row is used for the name of outputParameters.

Here's a sample result from the UI execution:

{
   "resultSetCount": 1,
   "recordsAffected": 0,
   "resultSets": [
      {
         "rowCount": 1,
         "rows": [
            {
               "output1": 10,
               "output2": "\"Hello World\""
            }
         ]
      }
   ],
   "outputParameters": {
      "output10": 10,
      "output20": "\"Hello World\""
   }
}

Here's a payload sample for the output parameter:

{
    "scripts": [
        {
            "type": "NonQuery",
            "text": "CREATE OR REPLACE PROCEDURE swap_proc (input1 IN TEXT, input2 IN BIGINT, output1 OUT BIGINT, output2 OUT TEXT) LANGUAGE plpgsql AS $$ DECLARE BEGIN output2 := input1; output1 := input2; END $$ "
        },
        {
            "parameters": [
                {
                    "name": "input1",
                    "type": "String",
                    "value": "Hello world",
                    "direction": "Input"
                },
                {
                    "name": "input2",
                    "type": "Int32",
                    "value": "1234",
                    "direction": "Input"
                },
                {
                    "name": "output1",
                    "type": "Int32",
                    "value": "",
                    "direction": "Output"
                },
                {
                    "name": "output2",
                    "type": "String",
                    "value": "",
                    "direction": "Output",
                    "size": 100
                }
            ],
            "type": "Query",
            "text": "CALL swap_proc(@input1, @input2, null, null)"
        }
    ],
    "scriptBlockExecutionTimeout": "02:00:00"
}

Positional parameters

Important

Multiple-query statements that use positional parameters aren't supported. Ensure that any queries that have positional parameters are in separate script blocks within a script activity.

To use positional parameters, use a placeholder of $<positional number> in your query. On the UI, under Script parameters, the Name box must be left blank. In the payload, the name field must be specified as null.

The following example shows a valid positional parameter.

Screenshot that shows an example of a valid positional parameter.

{
    "name": "Sample for valid positional parameter",
    "type": "Script",
    "dependsOn": [],
    "policy": {
        "timeout": "0.12:00:00",
        "retry": 0,
        "retryIntervalInSeconds": 30,
        "secureOutput": false,
        "secureInput": false
    },
   "typeProperties": {
        "scripts": [
            {
                "parameters": [
                    {
                        "type": "String",
                        "value": "John",
                        "direction": "Input"
                    },
                    {
                        "type": "Int32",
                        "value": "52",
                        "direction": "Input"
                    }
                ],
                "type": "Query",
                "text": {
                    "value": "SELECT * FROM customers WHERE first_name = $1 and age = $2;",
                    "type": "Expression"
                }
            }
        ],
        "scriptBlockExecutionTimeout": "02:00:00"
    },
    "externalReferences": {
        "connection": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    }
}

The following example shows an invalid positional parameter:

{
    "name": "Sample for invalid positional parameter",
    "type": "Script",
    "dependsOn": [],
    "policy": {
        "timeout": "0.12:00:00",
        "retry": 0,
        "retryIntervalInSeconds": 30,
        "secureOutput": false,
        "secureInput": false
    },
    "typeProperties": {
        "scripts": [
            {
                "parameters": [
                    {
                        "type": "String",
                        "value": "John",
                        "direction": "Input"
                    },
                    {
                        "type": "Int32",
                        "value": "52",
                        "direction": "Input"
                    }
                ],
                "type": "Query",
                "text": {
                    "value": "SELECT * FROM customers WHERE first_name = $1; SELECT * FROM customers WHERE age = $2;",
                    "type": "Expression"
                }
            }
        ],
        "scriptBlockExecutionTimeout": "02:00:00"
    },
    "externalReferences": {
        "connection": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    }
}

Configure advanced settings

Execution timeout for script blocks

You can configure a timeout in minutes for each script block that you run. If any script block within your script activity exceeds the specified timeout, the entire activity fails.

Screenshot that shows an advanced setting in a script activity to set execution timeout for a script block.

    "typeProperties": {
        "scripts": [
            {
                "type": "Query",
                "text": {
                    "value": "SELECT pg_sleep(75);",
                    "type": "Expression"
                }
            }
        ],
        "scriptBlockExecutionTimeout": "00:01:00"
    },
    "externalReferences": {
        "connection": "9b351899-a92f-4e00-bc48-200a2c287f4c"
    }

Logging

You can log PostgreSQL notices to an external Azure Blob Storage account or to internal storage.

External storage

To set up external logging:

  1. On the Settings tab, expand the Advanced section.

  2. Select the Enable logging checkbox and the External storage option.

  3. Add a Blob Storage account by creating a new linked service for your Blob Storage account.

  4. You can optionally provide a folder path. If you leave the Folder path box blank, the logs go to the scriptactivity-logs folder.

Screenshot that shows an external logging example.

"typeProperties": {
   "scripts": [
      {
         "type": "Query",
         "text": "DO $$ BEGIN RAISE Notice 'Hello'; RAISE Notice 'World!'; END $$;"
      }
   ],
   "scriptBlockExecutionTimeout": "02:00:00",
   "logSettings": {
      "logDestination": "ExternalStore",
      "logLocationSettings": {
         "linkedServiceName": {
            "referenceName": "<Azure Blob Storage linked service name>",
            "type": "LinkedServiceReference"
         },
         "path": "<Azure Blob Storage folder path>"
      }
   }
}

Activity output

To set up logging in the activity output:

  1. On the Settings tab, expand the Advanced section.

  2. Select the Enable logging checkbox and the Activity output option.

Screenshot that shows selections for setting up activity output logging.

"typeProperties": {
   "scripts": [
      {
         "type": "Query",
         "text": "DO $$ BEGIN RAISE Notice 'Hello'; RAISE Notice 'World!'; END $$;"
      }
   ],
   "scriptBlockExecutionTimeout": "02:00:00",
   "logSettings": {
      "logDestination": "ActivityOutput"
   }
}