Synapse ERROR: : COPY statement input file schema discovery failed: Cannot bulk load

Question

Synapse ERROR: : COPY statement input file schema discovery failed: Cannot bulk load

Devender 66

I am facing one issue when we are loading the data from Synapse Spark notebook to Synapse dedicated pool we are getting an error:

the code i am running to load data into SQL dedicated pool:

df_1.write.mode("append").synapsesql("TEST_DATABASE.dbo.Test_table", Constants.INTERNAL)

When we are running this Code manually through spark notebook it is running fine but when running dynamically through pipleine it is throwing the below error. I want to know what can be the cause for this when running through pipeline. This issue we are facing from last 1 week, till last week all the pipeline were functioning properly.

Py4JJavaError: An error occurred while calling o4399.synapsesqlforpython.
: com.microsoft.spark.sqlanalytics.SQLAnalyticsConnectorException: COPY statement input file schema discovery failed: Cannot bulk load. The file "https://Dummyadls.dfs.core.windows.net/DummySynapse/synapse/workspaces/WorkingSynapse/sparkpools/AdventuteTableSpark/sparkpoolinstances/2864980c-2195-47d6-bc42-d8b4ff8e3e11/livysessions/2023/05/11/1910/tempdata/
TEST_DATABASE/dbo/Test_table/internal/Append/1683796649698/application_1683796164761_0003/part-00000-2a984778-ed60-4fb4-aeed-ab2efb9e1399-c000.snappy.parquet" does not exist or you don't have file access rights.
	at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.sqlanalytics(SqlAnalyticsConnectorClass.scala:347)
	at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.synapsesql(SqlAnalyticsConnectorClass.scala:191)
	at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.synapsesqlforpython(SqlAnalyticsConnectorClass.scala:203)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: COPY statement input file schema discovery failed: Cannot bulk load. The file

Clément Foursans 15

Hello @Devender,

We were facing the exact same issue for a few days now, I've managed to open a case and here's the support answer about this issue :

Hope this helps.

Root cause – Recent release of Synapse Spark to SQL Dedicated Pool Connector introduced a regression where write to Synapse SQL Dedicated pool when using notebooks in pipelines will fail.
com.microsoft.spark.sqlanalytics.SQLAnalyticsConnectorException: COPY statement input file schema discovery failed: Cannot bulk load. The file \\'https://xxxxxxxxx.dfs.core.windows.net/staging/SQLAnalyticsConnectorStaging/xxxxxx/xxxxxxx/xxxxxx/internal/Append/989292928928/application_xxxxxxxxx_0004/part-00000-toto-toto-toto-c000.snappy.parquet\\' does not exist or you don't have file access rights.\\n at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.sqlanalytics(SqlAnalyticsConnectorClass.scala:347)\\n\\n at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.synapsesql(SqlAnalyticsConnectorClass.scala:191)\\n\\n at writeToSynapseTable(console:71)\\n\\n ... 73 elided\\n\\nCaused by: com.microsoft.sqlserver.jdbc.SQLServerException: COPY statement input file schema discovery failed: Cannot bulk load. 

The file
Storage logs for the corresponding failure may show "IPAuthorization" errors. This error is expected even if TEMP_FOLDER parameter is removed.


Mitigation – Request you to set below spark config thru notebook/at pool level.
              spark.conf.set("spark.synapse.runAsMsi", "true")

1 answer

Your answer

Clément Foursans 15 Reputation points

2023-05-25T12:58:13.11+00:00

Hello @Devender,

We were facing the exact same issue for a few days now, I've managed to open a case and here's the support answer about this issue :

Hope this helps.

Root cause – Recent release of Synapse Spark to SQL Dedicated Pool Connector introduced a regression where write to Synapse SQL Dedicated pool when using notebooks in pipelines will fail. com.microsoft.spark.sqlanalytics.SQLAnalyticsConnectorException: COPY statement input file schema discovery failed: Cannot bulk load. The file \\'https://xxxxxxxxx.dfs.core.windows.net/staging/SQLAnalyticsConnectorStaging/xxxxxx/xxxxxxx/xxxxxx/internal/Append/989292928928/application_xxxxxxxxx_0004/part-00000-toto-toto-toto-c000.snappy.parquet\\' does not exist or you don't have file access rights.\\n at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.sqlanalytics(SqlAnalyticsConnectorClass.scala:347)\\n\\n at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.synapsesql(SqlAnalyticsConnectorClass.scala:191)\\n\\n at writeToSynapseTable(console:71)\\n\\n ... 73 elided\\n\\nCaused by: com.microsoft.sqlserver.jdbc.SQLServerException: COPY statement input file schema discovery failed: Cannot bulk load. The file Storage logs for the corresponding failure may show "IPAuthorization" errors. This error is expected even if TEMP_FOLDER parameter is removed. Mitigation – Request you to set below spark config thru notebook/at pool level. spark.conf.set("spark.synapse.runAsMsi", "true")

Answer 1

Vinodh247 34,666 MVP Volunteer Moderator

Hi, Thanks for reaching out to Microsoft Q&A. From the error either you should have "storage blob data contributor" role to access the File from adls to synapse workspace or the file is missing from the given path., Pls check if the path given if correct. Pls accept and upvote the answer if you find this answer correct or it helped to fix your issue.

Devender 66 Reputation points

2023-05-11T14:40:37.4433333+00:00

hi @Vinodh247 how can we check the temporary folders in adls. can you please help me with this.
Devender 66 Reputation points

2023-05-15T12:35:24.09+00:00

i have checked the path. the Path is correct. But still not able to figure out the issue.
Vinodh247 34,666 Reputation points MVP Volunteer Moderator

2023-05-17T10:43:03.5966667+00:00

Can you confirm if you have provided your synapse workspace, the 'Storage Blob Data Contributor' role access? if you are able to run separately but the pipeline fails this should fix it. Try and and let me know

Pls accept and upvote the answer if you find this answer correct or it helped to fix your issue.
Vinodh247 34,666 Reputation points MVP Volunteer Moderator

2023-05-27T11:57:16.7633333+00:00

Just checking in to see if the above answer helped. Please do consider clicking Accept Answer and Up-Vote for the same as accepted answers help community as well.

Share via

Synapse ERROR: : COPY statement input file schema discovery failed: Cannot bulk load

1 answer

Your answer