Inconsistency in Null Values While copying data from delta lake to AzSQL

Rakesh Reddy 0 Reputation points
2024-01-20T03:09:35.68+00:00

We build a ADF pipeline to copy data from delta lake to Azure SQL db. Below are Sink configuration in ADF copy activity

Write Behaviour: Insert

Bulk Insert Table lock: No

Autocreate Table: None

enable staging: Yes

when the data loaded to sql table, empty values in columns are populated as blanks. If the same data copied to azure sql in different environment, empty values are populated as Null.

In both the environments, table schema is exactly same. This is causing issue when the columns that are part of PKs are empty. Please suggest what might be the issue and how to overcome

Azure SQL Database
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,369 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,344 questions
{count} votes

2 answers

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,516 Reputation points Microsoft Employee
    2024-02-01T06:56:48.68+00:00

    Hi Rakesh Reddy, Thank you for sharing resolution details. This helps all community as well. Since you cannot accept your own answer, resharing answer here. Kindly consider marking it as Accepted answer.

    We observed that the issue is because of databricks runtime version. We have used 12.2 runtime version in both environments and tried, then the Nulls are populated correctly in both environments.

    Please consider hitting Accept Answer button. Accepted answers help community as well.

    1 person found this answer helpful.

  2. Pinaki Ghatak 5,575 Reputation points Microsoft Employee
    2024-01-20T12:07:57.1633333+00:00

    Hello Rakesh Reddy

    The issue you’re experiencing might be due to how Azure Data Factory (ADF) handles empty values during the data transfer process. In some cases, ADF might interpret empty values as blanks rather than NULL. This behavior can vary based on the specific configurations of your ADF pipeline and the settings of your Azure SQL database. Here are a couple of suggestions that might help you overcome this issue:

    1. Derived Column Transformation: You can add a Derived Column step in your Data Flow to replace empty values with NULL. You can add a column pattern and use iifNull($$,toString(null())) to detect empty value in each column and replace it with NULL.
    2. Expression in Mapping Dataflow: Another approach is to use an expression in your ADF mapping dataflow to convert blank or empty strings to NULL. The expression iif(column1=='',toString(null()),column1) can be used to check if a column is empty, and if so, convert it to NULL.

    Remember to test these changes in a controlled environment before applying them to your production pipeline to ensure they work as expected. Let us know if this helps, by tagging this as answered.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.