Getting different results when I run a Notebook in Data Factory vs manually.

Question

Hi,

I have a pipeline that has seven Notebooks and, all of them are executing different SQL scripts and generates CSV files. Two Notebooks are working correctly but, the other five Notebooks are just creating CSV files with headers only(without any rows). However, when I run those Notebooks manually, they generate CSV files with headers and rows. I tried running the Notebooks in different pipelines but still, have the same problem. I'm not sure what causes this problem but I couldn't find any solution. Lastly, I'm using Sparks 3.

Answer

Summary of issues:

The results of a Databricks notebook were different when run directly from Databricks, versus Data Factory calling Databricks.

This notebook used temporary tables. Temporary tables are stored on the cluster. Because temporary tables are stored on the cluster, they do not survive cluster shut down.

The settings in Data Factory were set to create new cluster to run the notebook. This caused the temporary tables not to be retained between notebook runs.

After changing the settings in Data Factory to use an existing cluster, the results of running directly from Databricks, and being called via Data Factory matched. This is because the temporary tables persisted since the cluster was retained.

Share via

Getting different results when I run a Notebook in Data Factory vs manually.

PRADEEPCHEEKATLA-MSFT

1 answer