Share via

Spark-COSMOS Serialization Issue with NULL Values

Sandeep 20 Reputation points
2025-07-16T16:42:34.0266667+00:00

The goal is to prevent loading JSON elements into COSMOS when they contain NULL values. The Spark-COSMOS configuration documentation mentions the property spark.cosmos.serialization.inclusionMode set to NonNull, which should prevent the creation of JSON properties for explicit null values. However, NULL values are still appearing in the COSMOS container despite using this property.

I have attached code that we are trying to this ticket.

The following environment details were used to execute the code:

  • Databricks DBR version: 14.3 LTS
  • Attached Jar: azure-cosmos-spark_3-5_2-12-4.33.1.jar in Databricks cluster.

The example ID ABCDE100000000AG is expected to show that "address_1":null should not populate in the target COSMOS container, but it is still visible.

Is there a potential mistake in the implementation, or are additional configurations needed in the Spark DataFrame write operation to achieve the desired result? The code being used is attached for reference.

Azure Databricks
Azure Databricks

An Apache Spark-based analytics platform optimized for Azure.


1 answer

Sort by: Most helpful
  1. Smaran Thoomu 35,375 Reputation points Microsoft External Staff Moderator
    2025-07-16T17:58:18.51+00:00

    Hi Sandeep
    It seems like you're having issues with preventing NULL values from getting serialized into your Cosmos DB when using Spark. You've set the spark.cosmos.serialization.inclusionMode to NonNull, which should ideally filter out any properties with explicit null values. However, they are still appearing in the Cosmos container.

    Here's what you might want to check or try:

    1. Verify Configuration: Make sure that the property spark.cosmos.serialization.inclusionMode is correctly set to NonNull in your Spark configuration.
    2. Check for Other Configurations: Ensure that any other settings related to serialization in your Spark write operation do not conflict with the inclusionMode setting.
    3. Test Serialization Options: If you have control over serialization options, consider looking into the CosmosSerializationOptions.IgnoreNullValues setting; this property controls whether the serializer should ignore null properties. Ensure this is enabled if applicable in your context.
    4. Review Write Operation: Ensure that the way you are writing the data from the DataFrame to Cosmos DB is properly set up. If you are using a custom serializer or specific write settings, make sure there are no overrides affecting the null handling.
    5. Update Your Spark and Jar Version: Occasionally, bugs or issues in specific versions of libraries can lead to unexpected behavior. Ensure that your Databricks DBR version and the JAR version you are using are compatible and up to date.

    If none of these work, it might be helpful to look at the attached code you're using to check for logical errors in how data is being prepared for write, or if there's something else that could interfere with the serialization process.

    To assist you better, here are some follow-up questions:

    • Can you share the specific code snippet where you're setting the spark configuration and performing the write operation?
    • Have you already tried any troubleshooting steps, and if so, what were the outcomes?
    • Are there other serialization configurations apart from inclusionMode that you've set up?
    • Is there any logging output or error messages you received during this process that might give more context?

    Hope this helps! Looking forward to your response.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.