Share via

Mapping data flow error: PathNotFound The specified path does not exist

William Connell 0 Reputation points
2023-07-06T13:42:51.4933333+00:00

Just recently the scheduled pipelines running in our Azure Synapse Analytics workspace started to fail with the error message.

"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'sinkXXXX': Operation failed: "The specified path does not exist."

It fails on the mapping data flow sink. But if I rerun the pipeline then it might fail on the same sink or a different sink.

The path in question is not the folder in the data lake that the sink is actually writing to, but an _temporary folder inside that folder. For example

404, GET, https://<storageaccount>.dfs.core.windows.net/<container>?upn=false&resource=filesystem&maxResults=5000&directory=Bronze/Something/Something/Something/_temporary/0/_temporary/attempt_202307060632398990770822798711859_0958_m_000000_958&timeout=90&recursive=false

From my understanding in Spark underneath the mapping data flow when writing a dataframe to parquet it creates the _temporary folder, writes the files to the _temporary folder, and then moves them to the main folder once the job is complete, and then deletes the _temporary folder.

But the sink is failing because it cannot find the _temporary folder that it has just created.

Everything was working okay and the pipelines were running okay, but on the 26th of June 2023 multiple pipelines started failing with this error.

It is like the storage account cannot keep up with the Azure Synapse Analytics.

Any help or suggestions would be appreciated.

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.