Can't write to blob storage from AzureML Spark Cluster

Dance, Cody R. (ALT) 25

Using the Azure ML Spark compute (serverless or attached), it is not possible to write to gen2 datalake blob storage.

The code below produces the error 'Caused by: org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: This operation is not permitted on a non-empty directory.'

This greatly reduces the usefulness of the spark integration. Any help would be appreciated.

df = #pyspark.pandas or pyspark.sql dataframe
df.to_parquet('azureml://[blah]) #or df.write.parquet('azureml://[blah])

PRADEEPCHEEKATLA-MSFT 85,586 Reputation points Microsoft Employee

2023-05-11T11:01:27.1366667+00:00

The error message you provided indicates that the operation is not permitted on a non-empty directory. This error can occur when you try to write to a directory that already contains data.

To resolve this issue, you can try the following steps:

Check if the directory already contains data: You can check if the directory already contains data by using the Azure Storage Explorer or the Azure Portal. If the directory already contains data, you may want to delete the data before writing new data to the directory.

Use overwrite mode: You can use overwrite mode to overwrite the existing data in the directory. To do this, you can set the "mode" parameter to "overwrite" when writing data to the directory.

Use a different directory: If the directory already contains data and you do not want to overwrite it, you can use a different directory to write the new data.

I hope this helps! Let me know if you have any further questions.
Dance, Cody R. (ALT) 25 Reputation points

2023-05-11T22:03:06.7033333+00:00

Hi Pradeep, sorry for the lack of clarity. The 'non-empty' directory is the _temporary directory that df.write.parquet creates as part of its process.

Adding the mode=overwrite parameter does not help.
PRADEEPCHEEKATLA-MSFT 85,586 Reputation points Microsoft Employee

2023-05-12T04:34:33.6833333+00:00

@Dance, Cody R. (ALT) - Have you tried using different directory?

Could you please share the code snippet used and complete stack trace of the error message?
PRADEEPCHEEKATLA-MSFT 85,586 Reputation points Microsoft Employee

2023-05-17T05:01:49.08+00:00

@Dance, Cody R. (ALT) - We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Share via

Can't write to blob storage from AzureML Spark Cluster