Synapse ERROR: : COPY statement input file schema discovery failed: Cannot bulk load

Devender 61 Reputation points
2023-05-11T12:33:15.7166667+00:00

I am facing one issue when we are loading the data from Synapse Spark notebook to Synapse dedicated pool we are getting an error:

the code i am running to load data into SQL dedicated pool:

df_1.write.mode("append").synapsesql("TEST_DATABASE.dbo.Test_table", Constants.INTERNAL)

When we are running this Code manually through spark notebook it is running fine but when running dynamically through pipleine it is throwing the below error. I want to know what can be the cause for this when running through pipeline. This issue we are facing from last 1 week, till last week all the pipeline were functioning properly.
Py4JJavaError: An error occurred while calling o4399.synapsesqlforpython.
: com.microsoft.spark.sqlanalytics.SQLAnalyticsConnectorException: COPY statement input file schema discovery failed: Cannot bulk load. The file "https://Dummyadls.dfs.core.windows.net/DummySynapse/synapse/workspaces/WorkingSynapse/sparkpools/AdventuteTableSpark/sparkpoolinstances/2864980c-2195-47d6-bc42-d8b4ff8e3e11/livysessions/2023/05/11/1910/tempdata/
TEST_DATABASE/dbo/Test_table/internal/Append/1683796649698/application_1683796164761_0003/part-00000-2a984778-ed60-4fb4-aeed-ab2efb9e1399-c000.snappy.parquet" does not exist or you don't have file access rights.
	at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.sqlanalytics(SqlAnalyticsConnectorClass.scala:347)
	at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.synapsesql(SqlAnalyticsConnectorClass.scala:191)
	at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.synapsesqlforpython(SqlAnalyticsConnectorClass.scala:203)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: COPY statement input file schema discovery failed: Cannot bulk load. The file 
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,576 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247-1375 12,426 Reputation points
    2023-05-11T13:35:28.1966667+00:00

    Hi, Thanks for reaching out to Microsoft Q&A. From the error either you should have "storage blob data contributor" role to access the File from adls to synapse workspace or the file is missing from the given path., Pls check if the path given if correct. Pls accept and upvote the answer if you find this answer correct or it helped to fix your issue.