An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
Mapping data flow error: PathNotFound The specified path does not exist
Just recently the scheduled pipelines running in our Azure Synapse Analytics workspace started to fail with the error message.
"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'sinkXXXX': Operation failed: "The specified path does not exist."
It fails on the mapping data flow sink. But if I rerun the pipeline then it might fail on the same sink or a different sink.
The path in question is not the folder in the data lake that the sink is actually writing to, but an _temporary folder inside that folder. For example
404, GET, https://<storageaccount>.dfs.core.windows.net/<container>?upn=false&resource=filesystem&maxResults=5000&directory=Bronze/Something/Something/Something/_temporary/0/_temporary/attempt_202307060632398990770822798711859_0958_m_000000_958&timeout=90&recursive=false
From my understanding in Spark underneath the mapping data flow when writing a dataframe to parquet it creates the _temporary folder, writes the files to the _temporary folder, and then moves them to the main folder once the job is complete, and then deletes the _temporary folder.
But the sink is failing because it cannot find the _temporary folder that it has just created.
Everything was working okay and the pipelines were running okay, but on the 26th of June 2023 multiple pipelines started failing with this error.
It is like the storage account cannot keep up with the Azure Synapse Analytics.
Any help or suggestions would be appreciated.