Azure Data Factory - Default of Complex Types null/undefined instead of {}

Omar Abdel Bari 61 Reputation points Microsoft Employee
2022-09-12T22:42:21.917+00:00

I have multiple case where types are [correctly] projected as complex types (in C# or Java it would be an object). The source records would store these values as null. For our data regionalization project I noticed that it writes the same fields as {} instead of null (or not having the field at all) in the output (sink). I would guess this is the expected behaviour of complex type but this actually would break our application code if we try to read the updated record in those scenarios. I'm looking a simple, time efficient workaround to either not write the field in the cosmos document at all or write it as null if it was null in the source.

We are using CosmosDB as the source and the sink.

Example source document (when fields are populated)

"CompanyInformation": {
"CompanyName": "123"
},

Example source document (when field is null)

"CompanyInformation": null,

Example sink document (when field in source was null and CompanyInformation is complex type)

"CompanyInformation": {}

Alternatives considered

  • Pre Trigger for create on cosmos collection: Problem is that cosmos requires that the pretrigger is specified as a request option and I couldn't find a way to do this in the Sink UI properties
  • Stored Procedure (has to be per container) and invoke in azure cloud shell: Trying this out, but we may be restricted due to prod security policy.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,584 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Omar Abdel Bari 61 Reputation points Microsoft Employee
    2022-09-22T18:23:02.827+00:00

    You won't have that level of control in ADF unfortunately. If this is an issue for you as it was for me then you can try [Azure] Databricks instead.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.