Job failed due to reason: at Source... ...Job aborted due to stage failure

Question

Job failed due to reason: at Source... ...Job aborted due to stage failure

Steven Howe 111

I have a pipeline in Synapse which calls a data flow. That data flow has started to fail and return the below error.

{"message":"Job failed due to reason: at Source 'RawTransaction': org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 43.0 failed 4 times, most recent failure: Lost task 8.3 in stage 43.0 (TID 3283, 58924e9c16f8411a93ee73d20b870adc0004d790991, executor 1): org.apache.spark.SparkException: Exception thrown in awaitResult: \n\tat org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)\n\tat org.apache.spark.util.ThreadUtils$.parmap(ThreadUtils.scala:290)\n\tat org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readParquetFootersInParallel(ParquetFileFormat.scala:538)\n\tat org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$9.apply(ParquetFileFormat.scala:611)\n\tat org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$9.apply(ParquetFileFormat.scala:603)\n\tat org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)\n\tat org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.app. Details:at Source 'RawTransaction': org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 43.0 failed 4 times, most recent failure: Lost task 8.3 in stage 43.0 (TID 3283, 58924e9c16f8411a93ee73d20b870adc0004d790991, executor 1): org.apache.spark.SparkException: Exception thrown in awaitResult: \n\tat org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)\n\tat org.apache.spark.util.ThreadUtils$.parmap(ThreadUtils.scala:290)\n\tat org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readParquetFootersInParallel(ParquetFileFormat.scala:538)\n\tat org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$9.apply(ParquetFileFormat.scala:611)\n\tat org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$9.apply(ParquetFileFormat.scala:603)\n\tat org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)\n\tat org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)\n\tat org.","failureType":"UserError","target":"Prepare_forsight_All","errorCode":"DFExecutorUserError"}

I have tried switching from memory optimised to general compute, and increasing the cores on the IR but to no resolution.

Reeck, Cameron 1 Reputation point

2021-04-15T20:36:08.19+00:00

Do I have an added job

Accepted answer

1 additional answer

Your answer

Reeck, Cameron 1 Reputation point

2021-04-15T20:36:08.19+00:00

Do I have an added job

Answer 1

Steven Howe 111

I was able to resolve this issue. There were two corrupted files out of many hundreds that were fine in the source. Removing these two corrupted files allowed the data flow to process and complete successfully again as normal.

PRADEEPCHEEKATLA 90,651 Reputation points Moderator

2021-04-16T03:37:08.977+00:00

Hi @Steven Howe ,

Glad to know that your issue has resolved. And thanks for sharing the solution, which might be beneficial to other community members reading this thread.

----------------------------------------------------------------------------------------

Do click on Accept Answer and Up-Vote on the post that helps you, this can be beneficial to other community members.
Utsav Chanda 76 Reputation points

2021-09-07T19:04:58.897+00:00
@Steven Howe I have an almost similar error. I am executing a pipeline with a Data Flow Activity that reads and writes parquet files organized in a partitioned folder structure in ADLS Gen 1. The Data Flow and Pipeline validation are fine. I can also see data in Sink Data Preview. However, on running the pipeline, I am repeatedly getting the below error

Do we know the precise reason for this error.

Could it be because of corrupt files in source?

Or is because of the number of files being read and written

Or has it got anything with the "Compute Type" and "Core Count" chosen in the pipeline?

Or is it some data type conversion issue between source and sink?

Could you please advise.

@Steven Howe how were you able to identify the 2 corrupted files out of many source files?
Steven Howe 111 Reputation points

2021-09-08T07:59:09.447+00:00

The copy activity was interrupted part way through as the source database went offline which then caused the failure to complete writing the files properly. These were easily found as they were the most recently modified files.

Answer 2

praveen sharma 1

this worked for me as well, thank you so much for sharing.

Share via

Job failed due to reason: at Source... ...Job aborted due to stage failure

1 additional answer

Your answer