DF-Executor-OutOfMemoryError" in azure synapse

Shailendra Kad 11 Reputation points
2021-11-09T10:51:36.997+00:00

I am having a json from ravenDB which is not valid json as it is having duplicate columns.
So my first step is to clean the json and if there are duplicates make separate json for each file.
I was able to do it for sample file and it ran successfully,
Then I tried for a 12 MB file and it also worked.
But when I tried for a full DB backup file which is 10GB in size , it is giving error.

This 10 GB file generates 3 separate json as it is having DOCS columns 3 times.
First file is 9.6GB and other 2 files are small like 120MB and 10KB.
For the first file when I am trying to load it in Synapse DWH I am getitng below error.

Job failed due to reason: Cluster ran into out of memory issue during execution. Also, Please note that the dataflow has one or more custom partitioning schemes. The transformation(s) using custom partition schemes: Json,Select1,FlattenDocsCS,Flatten2,Filter1,ChangeDataTypesDateColumns,CstomsShipment. 1. Please retry using an integration runtime with bigger core count and/or memory optimized compute type. 2. Please retry using different partitioning schemes and/or number of partitions.

I tried to publish the pipeline so that I am not running in debug mode and in a small cluster.
I changed cluster size to 32 cores and changed partition schemes in optimize tab to all possible things.
But still I a getting an error.
Kindly please help

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,387 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,677 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.