Issues to load data from json file with an array of objects from relational db into cosmosDB using ADF

Felipe Regis E Silva 46 Reputation points
2020-11-17T18:46:54.937+00:00

I'm trying to load data from a relational model into Azure SQL into CosmosDB to be consumed by an application. First, I built a JSON model with some nested properties and some hierarchies and I'm able to copy them from a limited number of rows into Cosmos. I get an error when trying to upload 10 or more documents into my collection. Then I started to make some tests to troubleshoot. Let me provide some details below you you can see:

This is the pipeline I built to copy data to Cosmos:

40728-pipeline.png

First I created a complex query to copy data from relational model to JSON file with nested properties and it broke. Then I checked the error 2200:

40586-error-2200.jpg

It's saying sth about one of the properties I created into the model. For testing reasons I commented this column and I still got a lot more with messagens like "message: Unexpected character encountered while parsing value: n.". Then I googled it and people had issues with this and no clear solution without hard coding. I mean, I'm not able to copy data from a JSON valid file to Cosmos because sth is breaking while JSON objects are being parsed. I was not able to find all those unexpected characters and I'm not sure if the best way to resolve it would be that I had to look for and clean up data before load them into cosmos, because they are strings at most.

When troubleshooting, I created a query to select columns from tables from sources and I built a simpler model with JSON with only a few nested properties also using "FOR JSON PATH, INCLUDE_NULL_VALUES" and then I copied data to my data lake fine. JSON file was checked and it is valid:

select  
 newid() as "_id",  
 [name] as "name.name",  
 case when charindex(' ',[name]) > 0 then  
 left([name],charindex(' ',[name])-1)  
 else  
 [name]  
 end as "name.firstName",  
 case when charindex(' ',[name]) > 0 then  
 reverse(substring(reverse([name]), 1, charindex(' ', reverse([name]))-1))  
 else  
 [name]  
 end as "name.lastName",  
 null as "name.username"  
from vw_Employees for json path, include_null_values  

Next, I copied data from JSON file into my ADLS as a source to CosmosDB as a sink using ADF. For the first TOP 10 rows it works fine and then when I try to load a greater amount of rows (10+) I get error code 2200 and the following message: "Message=Error occurred when deserializing source JSON file. Check if the data is valid JSON object format. Unexpected character encountered while parsing value: n".

I've used jsonlint to validate JSON files when it is built and it's valid.

I saved it into ADLS and it looks fine. When looking deeper and trying to find this unexpected character encountered I wasn't able to find it within the JSON file containing 10 objects or more. As I mentioned, the first 10 rows were copied to Cosmos within documents as designed originally in my data model.

Do you have any ideas or can you help me to figure out how to fix this?

I'm also considering further options to copy data from a relational database to document model into Cosmos meanwhile this way is not working because I need to load more than 250 thousand documents into cosmos for the application.

Thanks in advance.

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,638 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,696 questions
{count} votes

2 answers

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,086 Reputation points
    2020-11-30T19:41:39.287+00:00

    Hello @Felipe Regis E Silva . Sorry for the delay. I consulted with my colleagues , and we have something to try. Since we know that the issue is triggered by some value, and we can tell which row it first occurs in, we can further debug it by copying only one column at a time.

    That is, do a copy loading only the first column, and if that succeeds, try the second. If it fails, then we know the value is in the first column, on row X. In this fashion the problematic value will be revealed.

    Another option is to try the fault tolerance option in Copy Activity. (see picture)
    43804-image.png

    1 person found this answer helpful.
    0 comments No comments

  2. Felipe Regis E Silva 46 Reputation points
    2020-12-01T13:17:06.4+00:00

    Thanks again @MartinJaffer-MSFT !


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.