Exit multiple values from Azure Synapse analytics notebook

Question

Hello,

I want to Azure Synapse notebook to exit multiple values, and these values must be used as deciding factor to execute the Synapse ADF pipeline. For example,

I have a dataframe with 10 columns in Synapse notebook, How do I pass entire dataframe along with 10 columns and use column values in pipeline?

mssparkutils.notebook.exit(d)

I see mssparkutils.notebook.exit exits a value not a entire dataframe, Can someone please help me how to proceed?

Accepted Answer

Hi DeGuy,

Please be aware that, Azure Synapse Notebook allows you to pass only a single value out using the mssparkutils.notebook.exit() function. If you want to pass an entire dataframe, there's no direct way to do this. However, you can work around it.

One common method is to save your dataframe to a file (like Parquet, CSV, or JSON) in a known location in your Azure Data Lake or Blob Storage, and then read that file in your Azure Data Factory (ADF) pipeline. Here is an example of how you can save a dataframe as a CSV file:

df.coalesce(1).write.format('com.databricks.spark.csv').option('header', 'true').save('abfss://your-data-lake@your-storage-account.dfs.core.windows.net/folder/file.csv')

In this example, df is your dataframe, 'com.databricks.spark.csv' is the format you want to write in (CSV in this case), and the last argument is the path where you want to save the file.

In your ADF pipeline, you can read the CSV file using a Copy Activity or another appropriate activity.

The key to this approach is that the notebook and the pipeline have a shared understanding of where the data is being written and read.

If you need to pass only a few fields or some aggregate information about your dataframe, you can create a dictionary or a JSON object, serialize it into a string, and pass that string using mssparkutils.notebook.exit().

# Let's assume df_summary contains some summary information about your dataframe.

df_summary = {"column1_avg": avg1, "column2_max": max2, "column3_min": min3}

summary_str = json.dumps(df_summary)

mssparkutils.notebook.exit(summary_str)

Then, in your ADF pipeline, you can use the @json() function to parse the JSON string back into an object.

Remember, the approach to use depends on your specific needs and the size and complexity of your data.

I hope this helps?

Answer

I have the similar problem that I try so solve, in Microsoft Fabric though. Here is a simplified version of the problem:

import json

summa = 250
snitt = 100

result = {
    'sum': summa,
    'snitt': snitt
}

result_json = json.dumps(result)

mssparkutils.notebook.exit(result_json)

I want then to pass this json to outlook365-activity. I try to capture one of the keys with

@json(activity('exit').output.result.exitValue.snitt)

but I'm getting the following error:

The expression 'json(activity('exit').output.result.exitValue.snitt)' cannot be evaluated because property 'snitt' cannot be selected. Property selection is not supported on values of type 'String'.

Any workaround for this? My goal is to get some aggregations from notebook and pass them to outlook365.

Share via

Exit multiple values from Azure Synapse analytics notebook

1 additional answer