While fetching data from cosmos db conatiner and persisting the json file in ADLS Gen 2 through Synapse Pipeline, some objects in my json file are appearing as blank string causing data loss. This is happening in PROD only not in UAT and DEV.

Vity Pandita 0 Reputation points
2024-04-05T13:17:47.2733333+00:00

Hi,

The issue i am facing is with the Synapse Analytics Service - Pipeline. I have created a dataflow which is pulling data from cosmos db container in json format and storing that json file in ADLS Gen 2. When I check the json file in ADLS I see that some objects are appearing as blank String but when I query the cosmos db those objects are having some values. So where and why is my data getting lost. Also, there is no issue in UAT and DEV. Only PROD env is having this issue.

Already checked the link: https://github.com/Azure/azure-sdk-for-java/issues/29181

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,348 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,386 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,446 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247-1375 11,211 Reputation points
    2024-04-05T13:35:33.52+00:00

    Sometimes the data type conversions or null value handling steps can affect the data. I hope you have already checked that.

    With cosmosdb, check if the objects with missing values are being retrieved correctly in your query. In your pipeline, are there any custom expressions, aggregations, or conditional logic applied to your data? check any null handling or default values assigned during the transformation process. Also check if there are any data truncation issues that might cause loss of information.


  2. Smaran Thoomu 9,685 Reputation points Microsoft Vendor
    2024-04-10T08:38:28.9133333+00:00

    Hi @Vity Pandita

    I understand that you have already checked the mentioned pointers and have no luck. Based on your response, it seems that the issue is still happening in the PROD environment, but not in the UAT environment.

    In this case, I would suggest checking the following:

    1. Check the version of the Synapse Analytics Service - Pipeline in the PROD environment. Ensure that it is the same as the UAT environment.
    2. Check the network connectivity between the Synapse Analytics Service - Pipeline and the Cosmos DB container in the PROD environment. Ensure that there are no network issues or firewall rules blocking the connection.
    3. Check the permissions of the Synapse Analytics Service - Pipeline in the ADLS Gen2 storage account in the PROD environment. Ensure that the pipeline has the necessary permissions to write to the storage account.
    4. Check the dataflow settings in the PROD environment. Ensure that the dataflow is configured correctly and that the objects are being serialized correctly.
    5. Check the data type of the objects that are appearing as blank strings in the JSON file. Ensure that the data type is compatible with the data type of the corresponding field in the Cosmos DB container.

    If the issue persists, you can try contacting Microsoft Support for further assistance.

    I hope this helps! Let me know if you have any further questions.