Share via

Azure-DataBricks Spark not performing

Debasish22 1 Reputation point
2022-04-14T15:47:52.677+00:00

Hi All,

My requirement was to process approx 1TB of data stored in Azure container.The container contains millions of json files which are multi part in nature .

For this i was using HdInsight which was able to process the data in 45 mins approx :

Worker Nodes (1-4)autoscale - 16 cores 112 gb
Headnodes-2 - 4 cores 28gb

we planned to migrate to Azure Databricks Spark cluster

configuration of cluster used

Worker Nodes (4-10) autoscale - 8 cores 56gb - memory optimized
Head nodes - 4 cores 28gb

But this keeps running for more then 2.5 hrs but still the process was not completed, and i can see it used 4 worker nodes to the maximum but does not scale up to leverage the remaining worker nodes to speed up the process.

Can any one help if i am doing something wrong here.

Azure Databricks
Azure Databricks

An Apache Spark-based analytics platform optimized for Azure.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.