Getting different results when I run a Notebook in Data Factory vs manually.

Ufuktepe, Eren 1 Reputation point
2021-02-17T22:26:41.68+00:00

Hi,

I have a pipeline that has seven Notebooks and, all of them are executing different SQL scripts and generates CSV files. Two Notebooks are working correctly but, the other five Notebooks are just creating CSV files with headers only(without any rows). However, when I run those Notebooks manually, they generate CSV files with headers and rows. I tried running the Notebooks in different pipelines but still, have the same problem. I'm not sure what causes this problem but I couldn't find any solution. Lastly, I'm using Sparks 3.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,015 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,892 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,046 Reputation points
    2021-03-10T18:00:11.9+00:00

    Summary of issues:

    The results of a Databricks notebook were different when run directly from Databricks, versus Data Factory calling Databricks.

    This notebook used temporary tables. Temporary tables are stored on the cluster. Because temporary tables are stored on the cluster, they do not survive cluster shut down.

    The settings in Data Factory were set to create new cluster to run the notebook. This caused the temporary tables not to be retained between notebook runs.

    After changing the settings in Data Factory to use an existing cluster, the results of running directly from Databricks, and being called via Data Factory matched. This is because the temporary tables persisted since the cluster was retained.

    0 comments No comments