Spark Dataframe writing issue in azure from spark: One of the request inputs is not valid

Abdul Hafiz A.G A ID(RITM0203509) 11 Reputation points
2022-08-25T04:24:33.64+00:00

I am able to read data from azure blob storage but when writing back to azure storage then it throws below error . I am running this program in my local machine. Can someone help me out on this please.

Program

val config = new SparkConf();

val spark = SparkSession.builder().appName("AzureConnector ").config(config).master("local[*]").getOrCreate()

try {
spark.sparkContext.hadoopConfiguration.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")

spark.sparkContext.hadoopConfiguration.set("fs.wasbs.impl", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.azure.account.key.myaccount.blob.core.windows.net",
"mykey")

val csvDf = spark.read.csv("wasbs://workspaces@sundareshwaran .blob.core.windows.net/test/test.csv")
csvDf.show()
csvDf.coalesce(1).write.format("csv").mode("overwrite").save("wasbs://workspaces@sundareshwaran .blob.core.windows.net/test/output")

} catch {
case e: Exception => {
e.printStackTrace()
}
}

Error:

org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: One of the request inputs is not valid. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2482) at org.apache.hadoop.fs.azure.NativeAzureFileSystem$FolderRenamePending.execute(NativeAzureFileSystem.java:424) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.rename(NativeAzureFileSystem.java:1997) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:435) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:415) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:153) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:260) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.microsoft.azure.storage.StorageException: One of the request inputs is not valid. at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:162) at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:177) at com.microsoft.azure.storage.blob.CloudBlob.startCopyFromBlob(CloudBlob.java:764) at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobWrapperImpl.startCopyFromBlob(StorageInterfaceImpl.java:399) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2449) ... 19 more

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
198 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,916 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247-1375 11,046 Reputation points
    2022-08-25T05:32:30.39+00:00

    Hi AbdulHafiz,

    Thanks for reaching out to Microsoft Q&A.

    If the credentials are at fault you wont be able to both read/write but since you say you are able to read, can you double check if the location where you are trying to write is correct?

    wasbs://workspaces@sundareshwaran .blob.core.windows.net/test/output

    Please Upvote and Accept as answer if the reply was helpful.


  2. Junjie Cao 1 Reputation point Microsoft Employee
    2022-09-14T00:02:23.46+00:00

    @Abdul Hafiz A.G A ID(RITM0203509) did you try to use 'abfss' instead of 'wasbs'?

    0 comments No comments