Spark Exception: java.io.EOFException

Johan á Rogvi-Hansen 106 Reputation points
2022-11-28T07:53:18.93+00:00

Hi.

I'm developing in a notebook where I'm reading from different data lake folders all in .parquet format. Frequently, usually after writing an action like df.printSchema() or df.count() (the data is large but quite filtered before these 'checks'), I get the below error. This time, it was when saving the final dataframe as an external table:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 39 in stage 87.0 failed 4 times, most recent failure: Lost task 39.3 in stage 87.0 (TID 1273) (vm-b5843446 executor 1): java.io.EOFException

I thought it could be because of memory pressure, but it doesn't help scaling the pool, or switching to a larger (also scaled) pool. The only thing that seems to work is to change to a different pool, until the issue happens again. This is quite frustrating in a development process, perhaps you could guide me in a direction that resolves it?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

Accepted answer
  1. Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator
    2022-12-02T00:39:20.893+00:00

    Hello @Johan á Rogvi-Hansen ,

    I have discussed the issue with my internal team. As per them, Delta seems to be updating the table to the metastore and causing some concurrency issues.

    Can you please disable the below flag in your code, run your notebook, and see if it helps?

    spark.conf.set("spark.delta.catalog.update.enabled", "false")  
    
    1 person found this answer helpful.

3 additional answers

Sort by: Most helpful
  1. Johan á Rogvi-Hansen 106 Reputation points
    2022-12-01T16:27:59.567+00:00

    ---------------------------------------------------------------------------

    Py4JJavaError Traceback (most recent call last)
    <ipython-input-12-310f6468> in <module>

    /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py in saveAsTable(self, name, format, mode, partitionBy, **options)
    804 if format is not None:
    805 self.format(format)
    --> 806 self._jwrite.saveAsTable(name)
    807
    808 def json(self, path, mode=None, compression=None, dateFormat=None, timestampFormat=None,

    ~/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py in call(self, *args)
    1319
    1320 answer = self.gateway_client.send_command(command)
    -> 1321 return_value = get_return_value(
    1322 answer, self.gateway_client, self.target_id, self.name)
    1323

    /opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py in deco(*a, **kw)
    109 def deco(*a, **kw):
    110 try:
    --> 111 return f(*a, **kw)
    112 except py4j.protocol.Py4JJavaError as e:
    113 converted = convert_exception(e.java_exception)

    ~/cluster-env/env/lib/python3.8/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTERtype
    325 if answer[1] == REFERENCE_TYPE:
    --> 326 raise Py4JJavaError(
    327 "An error occurred while calling {0}{1}{2}.\n".
    328 format(target_id, ".", name), value)

    Py4JJavaError: An error occurred while calling o2076.saveAsTable.
    : org.apache.spark.SparkException: Job aborted.
    at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:496)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:254)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:186)
    at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:537)
    at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:228)
    at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:182)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:104)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:91)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:103)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
    at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:103)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:87)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:81)
    at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:125)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:872)
    at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:713)
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:687)
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:589)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:750)
    Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 42 in stage 17.0 failed 4 times, most recent failure: Lost task 42.3 in stage 17.0 (TID 809) (vm-57936934 executor 1): java.io.EOFException
    at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:88)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:547)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:527)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:521)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:497)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.$anonfun$getFooter$2(ParquetMetadataCacheReader.scala:98)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReaderSource.runWithTimerAndRecord(ParquetMetadataCacheReaderSource.scala:64)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.getFooter(ParquetMetadataCacheReader.scala:91)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.getFooter(ParquetMetadataCacheReader.scala:78)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$lzycompute$1(ParquetFileFormat.scala:332)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$1(ParquetFileFormat.scala:331)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:334)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:146)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:208)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:123)
    at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:543)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.columnartorow_nextBatch_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:168)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

    Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2464)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2413)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2412)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2412)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1168)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1168)
    at scala.Option.foreach(Option.scala:407)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1168)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2652)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2594)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2583)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
    Caused by: java.io.EOFException
    at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:88)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:547)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:527)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:521)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:497)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.$anonfun$getFooter$2(ParquetMetadataCacheReader.scala:98)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReaderSource.runWithTimerAndRecord(ParquetMetadataCacheReaderSource.scala:64)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.getFooter(ParquetMetadataCacheReader.scala:91)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.getFooter(ParquetMetadataCacheReader.scala:78)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$lzycompute$1(ParquetFileFormat.scala:332)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$1(ParquetFileFormat.scala:331)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:334)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:146)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:208)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:123)
    at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:543)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.columnartorow_nextBatch_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:168)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

    0 comments No comments

  2. Johan á Rogvi-Hansen 106 Reputation points
    2022-12-01T16:31:19.26+00:00

    /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py in saveAsTable(self, name, format, mode, partitionBy, **options)
    804 if format is not None:
    805 self.format(format)
    --> 806 self._jwrite.saveAsTable(name)
    807
    808 def json(self, path, mode=None, compression=None, dateFormat=None, timestampFormat=None,

    ~/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py in call(self, *args)
    1319
    1320 answer = self.gateway_client.send_command(command)
    -> 1321 return_value = get_return_value(
    1322 answer, self.gateway_client, self.target_id, self.name)
    1323

    /opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py in deco(*a, **kw)
    109 def deco(*a, **kw):
    110 try:
    --> 111 return f(*a, **kw)
    112 except py4j.protocol.Py4JJavaError as e:
    113 converted = convert_exception(e.java_exception)

    ~/cluster-env/env/lib/python3.8/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTERtype
    325 if answer[1] == REFERENCE_TYPE:
    --> 326 raise Py4JJavaError(
    327 "An error occurred while calling {0}{1}{2}.\n".
    328 format(target_id, ".", name), value)

    Py4JJavaError: An error occurred while calling o2076.saveAsTable.
    : org.apache.spark.SparkException: Job aborted.
    at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:496)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:254)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:186)
    at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:537)
    at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:228)
    at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:182)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:104)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:91)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:103)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
    at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:103)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:87)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:81)
    at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:125)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:872)
    at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:713)
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:687)
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:589)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:750)
    Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 42 in stage 17.0 failed 4 times, most recent failure: Lost task 42.3 in stage 17.0 (TID 809) (vm-57936934 executor 1): java.io.EOFException
    at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:88)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:547)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:527)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:521)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:497)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.$anonfun$getFooter$2(ParquetMetadataCacheReader.scala:98)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReaderSource.runWithTimerAndRecord(ParquetMetadataCacheReaderSource.scala:64)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.getFooter(ParquetMetadataCacheReader.scala:91)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.getFooter(ParquetMetadataCacheReader.scala:78)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$lzycompute$1(ParquetFileFormat.scala:332)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$1(ParquetFileFormat.scala:331)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:334)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:146)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:208)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:123)
    at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:543)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.columnartorow_nextBatch_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:168)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

    Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2464)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2413)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2412)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2412)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1168)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1168)
    at scala.Option.foreach(Option.scala:407)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1168)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2652)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2594)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2583)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
    Caused by: java.io.EOFException
    at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:88)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:547)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:527)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:521)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:497)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.$anonfun$getFooter$2(ParquetMetadataCacheReader.scala:98)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReaderSource.runWithTimerAndRecord(ParquetMetadataCacheReaderSource.scala:64)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.getFooter(ParquetMetadataCacheReader.scala:91)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.getFooter(ParquetMetadataCacheReader.scala:78)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$lzycompute$1(ParquetFileFormat.scala:332)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$1(ParquetFileFormat.scala:331)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:334)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:146)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:208)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:123)
    at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:543)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.columnartorow_nextBatch_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:168)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

    0 comments No comments

  3. Johan á Rogvi-Hansen 106 Reputation points
    2022-12-12T07:21:25.263+00:00

    ---------------------------------------------------------------------------

    Py4JJavaError Traceback (most recent call last)
    <ipython-input-12-31fc6112> in <module>

    /usr/hdp/current/spark3-client/jars/delta-core_2.12-1.2.1.6.jar/delta/tables.py in execute(self)
    928 See :py:class:~delta.tables.DeltaMergeBuilder for complete usage details.
    929 """
    --> 930 self._jbuilder.execute()
    931
    932 def __getMatchedBuilder(

    ~/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py in call(self, *args)
    1319
    1320 answer = self.gateway_client.send_command(command)
    -> 1321 return_value = get_return_value(
    1322 answer, self.gateway_client, self.target_id, self.name)
    1323

    /opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py in deco(*a, **kw)
    109 def deco(*a, **kw):
    110 try:
    --> 111 return f(*a, **kw)
    112 except py4j.protocol.Py4JJavaError as e:
    113 converted = convert_exception(e.java_exception)

    ~/cluster-env/env/lib/python3.8/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTERtype
    325 if answer[1] == REFERENCE_TYPE:
    --> 326 raise Py4JJavaError(
    327 "An error occurred while calling {0}{1}{2}.\n".
    328 format(target_id, ".", name), value)

    Py4JJavaError: An error occurred while calling o1976.execute.
    : org.apache.spark.SparkException: Job aborted.
    at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:496)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:254)
    at org.apache.spark.sql.delta.files.TransactionalWrite.$anonfun$writeFiles$3(TransactionalWrite.scala:349)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:104)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:91)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
    at org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:295)
    at org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:236)
    at org.apache.spark.sql.delta.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:100)
    at org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:225)
    at org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:224)
    at org.apache.spark.sql.delta.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:100)
    at org.apache.spark.sql.delta.commands.MergeIntoCommand.$anonfun$writeInsertsOnlyWhenNoMatchedClauses$1(MergeIntoCommand.scala:491)
    at org.apache.spark.sql.delta.commands.MergeIntoCommand.recordMergeOperation(MergeIntoCommand.scala:735)
    at org.apache.spark.sql.delta.commands.MergeIntoCommand.writeInsertsOnlyWhenNoMatchedClauses(MergeIntoCommand.scala:459)
    at org.apache.spark.sql.delta.commands.MergeIntoCommand.$anonfun$run$2(MergeIntoCommand.scala:311)
    at org.apache.spark.sql.delta.commands.MergeIntoCommand.$anonfun$run$2$adapted(MergeIntoCommand.scala:295)
    at org.apache.spark.sql.delta.DeltaLog.withNewTransaction(DeltaLog.scala:226)
    at org.apache.spark.sql.delta.commands.MergeIntoCommand.$anonfun$run$1(MergeIntoCommand.scala:295)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:121)
    at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:119)
    at org.apache.spark.sql.delta.commands.MergeIntoCommand.recordFrameProfile(MergeIntoCommand.scala:210)
    at org.apache.spark.sql.delta.metering.DeltaLogging.$anonfun$recordDeltaOperation$5(DeltaLogging.scala:115)
    at com.microsoft.spark.telemetry.delta.SynapseLoggingShim.recordOperation(SynapseLoggingShim.scala:95)
    at com.microsoft.spark.telemetry.delta.SynapseLoggingShim.recordOperation$(SynapseLoggingShim.scala:81)
    at org.apache.spark.sql.delta.commands.MergeIntoCommand.recordOperation(MergeIntoCommand.scala:210)
    at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:114)
    at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:99)
    at org.apache.spark.sql.delta.commands.MergeIntoCommand.recordDeltaOperation(MergeIntoCommand.scala:210)
    at org.apache.spark.sql.delta.commands.MergeIntoCommand.run(MergeIntoCommand.scala:293)
    at io.delta.tables.DeltaMergeBuilder.$anonfun$execute$1(DeltaMergeBuilder.scala:231)
    at org.apache.spark.sql.delta.util.AnalysisHelper.improveUnsupportedOpError(AnalysisHelper.scala:105)
    at org.apache.spark.sql.delta.util.AnalysisHelper.improveUnsupportedOpError$(AnalysisHelper.scala:91)
    at io.delta.tables.DeltaMergeBuilder.improveUnsupportedOpError(DeltaMergeBuilder.scala:123)
    at io.delta.tables.DeltaMergeBuilder.execute(DeltaMergeBuilder.scala:207)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:750)
    Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 31 in stage 25.0 failed 4 times, most recent failure: Lost task 31.3 in stage 25.0 (TID 764) (vm-ebd84383 executor 2): java.io.EOFException
    at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:88)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:547)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:527)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:521)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:497)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.$anonfun$getFooter$2(ParquetMetadataCacheReader.scala:98)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReaderSource.runWithTimerAndRecord(ParquetMetadataCacheReaderSource.scala:64)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.getFooter(ParquetMetadataCacheReader.scala:91)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.getFooter(ParquetMetadataCacheReader.scala:78)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$lzycompute$1(ParquetFileFormat.scala:332)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$1(ParquetFileFormat.scala:331)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:334)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:146)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:208)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:123)
    at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:543)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.columnartorow_nextBatch_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:168)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

    Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2464)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2413)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2412)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2412)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1168)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1168)
    at scala.Option.foreach(Option.scala:407)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1168)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2652)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2594)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2583)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
    Caused by: java.io.EOFException
    at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:88)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:547)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:527)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:521)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:497)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.$anonfun$getFooter$2(ParquetMetadataCacheReader.scala:98)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReaderSource.runWithTimerAndRecord(ParquetMetadataCacheReaderSource.scala:64)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.getFooter(ParquetMetadataCacheReader.scala:91)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetMetadataCacheReader$.getFooter(ParquetMetadataCacheReader.scala:78)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$lzycompute$1(ParquetFileFormat.scala:332)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$1(ParquetFileFormat.scala:331)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:334)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:146)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:208)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:123)
    at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:543)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.columnartorow_nextBatch_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:168)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.