any updates regarding this issue? we are experiencing exactly the same
com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@****.blob.core.windows.net/
Mayuri Kadam
81
Reputation points Microsoft Employee
Hi,
I am getting the following error message:
com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@****.blob.core.windows.net/cook/processYear=2021/processMonth=01/processDay=08/processHour=03/part-00003-tid-1903224826064875913-0ded1380-19a2-4ed2-9d4d-f19724b5bf5d-29101-1.c000.avro.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:286)
Caused by: java.io.IOException
at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:737)
Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch (integrity check failed), Expected value is x2rC4SZaPjA==, retrieved 6kwtbjN2v/w==.
at com.microsoft.azure.storage.blob.CloudBlob$9.postProcessResponse(CloudBlob.java:1409)
Any idea how to resolve this? Thanks.
4 answers
Sort by: Newest
-
-
Mayuri Kadam 81 Reputation points Microsoft Employee
2021-02-08T17:29:57.387+00:00 Hi @PRADEEPCHEEKATLA-MSFT , following is the code to read from azure blob container:
spark.conf.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") spark.conf.set("fs.azure.sas."+blobStorageAccContainerName+"."+blobStorageAccName+".blob.core.windows.net", blobStorageBlobSASToken) containerPath = "wasbs://" + blobStorageAccContainerName + "@" + blobStorageAccName + ".blob.core.windows.net/" spark.read .format(format) .load(containerPath + dirName)
-
Mayuri Kadam 81 Reputation points Microsoft Employee
2021-02-08T17:28:33.213+00:00 hi @PRADEEPCHEEKATLA-MSFT , following is the code to upload files to azure blob container:
spark.conf.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") spark.conf.set("fs.azure.sas."+blobStorageAccContainerName+"."+blobStorageAccName+".blob.core.windows.net", blobStorageBlobSASToken) containerPath = "wasbs://" + blobStorageAccContainerName + "@" + blobStorageAccName + ".blob.core.windows.net/" var storageCheckpointDirectory = checkpointDirectory if (storageCheckpointDirectory.isEmpty) { storageCheckpointDirectory = Paths.get(new java.io.File(".").getCanonicalPath).toString } storageCheckpointDirectory = storageCheckpointDirectory + blobStorageAccName + "/" + blobStorageAccContainerName + "/" + dirName val queryName = "uploadDataToBlob:" + dirName spark.sparkContext.setLocalProperty("spark.scheduler.pool", dirName) var df = data.writeStream .option("checkpointLocation", storageCheckpointDirectory) .queryName(queryName) .format(format) if (partitionCols.nonEmpty) df = df.partitionBy(partitionCols: _*) df.option("path", blob.getcontainerPath + dirName) .start()
-
Mayuri Kadam 81 Reputation points Microsoft Employee
2021-02-02T19:35:10.22+00:00 Hi Pradeep, please find the stack trace below:
com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@*******.blob.core.windows.net/cook/processYear=2021/processMonth=01/processDay=09/processHour=00/part-00003-tid-4640843606947508963-a580-40bd-ad0d-e7c92f1e5b1f-29229-1.c000.avro. at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:286) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:264) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:205) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:354) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:205) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage58.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:640) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage65.agg_doAggregateWithKeys_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage65.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:640) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139) at org.apache.spark.scheduler.Task.run(Task.scala:112) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1526) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:737) at com.microsoft.azure.storage.blob.BlobInputStream.dispatchRead(BlobInputStream.java:264) at com.microsoft.azure.storage.blob.BlobInputStream.readInternal(BlobInputStream.java:448) at com.microsoft.azure.storage.blob.BlobInputStream.read(BlobInputStream.java:420) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.read(DataInputStream.java:149) at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsInputStream.read(NativeAzureFileSystem.java:876) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.read(DataInputStream.java:149) at com.databricks.spark.metrics.FSInputStreamWithMetrics$$anonfun$read$3.apply$mcI$sp(FileSystemWithMetrics.scala:206) at com.databricks.spark.metrics.FSInputStreamWithMetrics$$anonfun$read$3.apply(FileSystemWithMetrics.scala:206) at com.databricks.spark.metrics.FSInputStreamWithMetrics$$anonfun$read$3.apply(FileSystemWithMetrics.scala:206) at com.databricks.spark.metrics.ExtendedTaskIOMetrics$class.withTimeMetric(FileSystemWithMetrics.scala:151) at com.databricks.spark.metrics.ExtendedTaskIOMetrics$class.com$databricks$spark$metrics$ExtendedTaskIOMetrics$$withTimeAndBytesMetric(FileSystemWithMetrics.scala:171) at com.databricks.spark.metrics.ExtendedTaskIOMetrics$$anonfun$withTimeAndBytesReadMetric$1.apply$mcI$sp(FileSystemWithMetrics.scala:185) at com.databricks.spark.metrics.ExtendedTaskIOMetrics$$anonfun$withTimeAndBytesReadMetric$1.apply(FileSystemWithMetrics.scala:185) at com.databricks.spark.metrics.ExtendedTaskIOMetrics$$anonfun$withTimeAndBytesReadMetric$1.apply(FileSystemWithMetrics.scala:185) at com.databricks.spark.metrics.SamplerWithPeriod.sample(FileSystemWithMetrics.scala:78) at com.databricks.spark.metrics.ExtendedTaskIOMetrics$class.withTimeAndBytesReadMetric(FileSystemWithMetrics.scala:185) at com.databricks.spark.metrics.FSInputStreamWithMetrics.withTimeAndBytesReadMetric(FileSystemWithMetrics.scala:192) at com.databricks.spark.metrics.FSInputStreamWithMetrics.read(FileSystemWithMetrics.scala:205) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.avro.mapred.FsInput.read(FsInput.java:54) at org.apache.spark.sql.avro.AvroFileFormat$.openAvroReader(AvroFileFormat.scala:275) at org.apache.spark.sql.avro.AvroFileFormat$$anonfun$buildReader$1.apply(AvroFileFormat.scala:202) at org.apache.spark.sql.avro.AvroFileFormat$$anonfun$buildReader$1.apply(AvroFileFormat.scala:183) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:147) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:134) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:235) ... 23 more Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch (integrity check failed), Expected value is xmypzfnpTdq8eFLxZ49DhQ==, retrieved CY7+V9/JEfVroD5omBB2Uw==. at com.microsoft.azure.storage.blob.CloudBlob$9.postProcessResponse(CloudBlob.java:1409) at com.microsoft.azure.storage.blob.CloudBlob$9.postProcessResponse(CloudBlob.java:1310) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:149) at com.microsoft.azure.storage.blob.CloudBlob.downloadRangeInternal(CloudBlob.java:1493) at com.microsoft.azure.storage.blob.BlobInputStream.dispatchRead(BlobInputStream.java:255) ... 53 more