Spark Dataframe writing issue in azure from spark: One of the request inputs is not valid

Question

I am able to read data from azure blob storage but when writing back to azure storage then it throws below error . I am running this program in my local machine. Can someone help me out on this please.

Program

val config = new SparkConf();

val spark = SparkSession.builder().appName("AzureConnector ").config(config).master("local[*]").getOrCreate()

try {
spark.sparkContext.hadoopConfiguration.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")

spark.sparkContext.hadoopConfiguration.set("fs.wasbs.impl", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.azure.account.key.myaccount.blob.core.windows.net",
"mykey")

val csvDf = spark.read.csv("wasbs://workspaces@sundareshwaran .blob.core.windows.net/test/test.csv")
csvDf.show()
csvDf.coalesce(1).write.format("csv").mode("overwrite").save("wasbs://workspaces@sundareshwaran .blob.core.windows.net/test/output")

} catch {
case e: Exception => {
e.printStackTrace()
}
}

Error:

org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: One of the request inputs is not valid. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2482) at org.apache.hadoop.fs.azure.NativeAzureFileSystem$FolderRenamePending.execute(NativeAzureFileSystem.java:424) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.rename(NativeAzureFileSystem.java:1997) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:435) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:415) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:153) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:260) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.microsoft.azure.storage.StorageException: One of the request inputs is not valid. at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:162) at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:177) at com.microsoft.azure.storage.blob.CloudBlob.startCopyFromBlob(CloudBlob.java:764) at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobWrapperImpl.startCopyFromBlob(StorageInterfaceImpl.java:399) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2449) ... 19 more

Answer

Hi AbdulHafiz,

Thanks for reaching out to Microsoft Q&A.

If the credentials are at fault you wont be able to both read/write but since you say you are able to read, can you double check if the location where you are trying to write is correct?

wasbs://workspaces@sundareshwaran .blob.core.windows.net/test/output

Please Upvote and Accept as answer if the reply was helpful.

Answer

@Abdul Hafiz A.G A ID(RITM0203509) did you try to use 'abfss' instead of 'wasbs'?

Spark Dataframe writing issue in azure from spark: One of the request inputs is not valid

2 answers