Can't write to blob storage from AzureML Spark Cluster
Anonymous
Using the Azure ML Spark compute (serverless or attached), it is not possible to write to gen2 datalake blob storage.
The code below produces the error 'Caused by: org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: This operation is not permitted on a non-empty directory.'
This greatly reduces the usefulness of the spark integration. Any help would be appreciated.
df = #pyspark.pandas or pyspark.sql dataframe
df.to_parquet('azureml://[blah]) #or df.write.parquet('azureml://[blah])
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,562 questions
Sign in to answer