Read json file which is generated from ravendb export which has duplicate columns
Hi Team,
I want to load the json file generated from ravendb export.
This is rather complex file and has lot of arrays and strings in it.
Only issue is, it has 2 columns which are duplicate.
I mean ideally this json is not valid , as it has 2 columns which are present in the file multiple times.
Sample structure as below
Docs[]
Attachments
Docs[]
Attachments
Indexes[]
Transformers[]
Docs[]
You see the Docs column is repeated multiple times.
And Docs is the imp column , which is array of documents.
In the source of data flow, I am getting the error as duplicate column.
{"message":"Job failed due to reason: at Source 'Json': org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data schema: Attachments
, Docs
;.
I am also trying to read this file as a delimited file and then see whether I can remove it.
Do you have any solution regarding how can I process it?
Or any other way I can load it?