Hi, Thanks for reaching out to Microsoft Q&A. From the error either you should have "storage blob data contributor" role to access the File from adls to synapse workspace or the file is missing from the given path., Pls check if the path given if correct. Pls accept and upvote the answer if you find this answer correct or it helped to fix your issue.
Synapse ERROR: : COPY statement input file schema discovery failed: Cannot bulk load
I am facing one issue when we are loading the data from Synapse Spark notebook to Synapse dedicated pool we are getting an error:
the code i am running to load data into SQL dedicated pool:
df_1.write.mode("append").synapsesql("TEST_DATABASE.dbo.Test_table", Constants.INTERNAL) When we are running this Code manually through spark notebook it is running fine but when running dynamically through pipleine it is throwing the below error. I want to know what can be the cause for this when running through pipeline. This issue we are facing from last 1 week, till last week all the pipeline were functioning properly.
Py4JJavaError: An error occurred while calling o4399.synapsesqlforpython. : com.microsoft.spark.sqlanalytics.SQLAnalyticsConnectorException: COPY statement input file schema discovery failed: Cannot bulk load. The file "https://Dummyadls.dfs.core.windows.net/DummySynapse/synapse/workspaces/WorkingSynapse/sparkpools/AdventuteTableSpark/sparkpoolinstances/2864980c-2195-47d6-bc42-d8b4ff8e3e11/livysessions/2023/05/11/1910/tempdata/ TEST_DATABASE/dbo/Test_table/internal/Append/1683796649698/application_1683796164761_0003/part-00000-2a984778-ed60-4fb4-aeed-ab2efb9e1399-c000.snappy.parquet" does not exist or you don't have file access rights. at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.sqlanalytics(SqlAnalyticsConnectorClass.scala:347) at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.synapsesql(SqlAnalyticsConnectorClass.scala:191) at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.synapsesqlforpython(SqlAnalyticsConnectorClass.scala:203) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:750) Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: COPY statement input file schema discovery failed: Cannot bulk load. The file
We were facing the exact same issue for a few days now, I've managed to open a case and here's the support answer about this issue :
Hope this helps.
Root cause – Recent release of Synapse Spark to SQL Dedicated Pool Connector introduced a regression where write to Synapse SQL Dedicated pool when using notebooks in pipelines will fail. com.microsoft.spark.sqlanalytics.SQLAnalyticsConnectorException: COPY statement input file schema discovery failed: Cannot bulk load. The file \\'https://xxxxxxxxx.dfs.core.windows.net/staging/SQLAnalyticsConnectorStaging/xxxxxx/xxxxxxx/xxxxxx/internal/Append/989292928928/application_xxxxxxxxx_0004/part-00000-toto-toto-toto-c000.snappy.parquet\\' does not exist or you don't have file access rights.\\n at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.sqlanalytics(SqlAnalyticsConnectorClass.scala:347)\\n\\n at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.synapsesql(SqlAnalyticsConnectorClass.scala:191)\\n\\n at writeToSynapseTable(console:71)\\n\\n ... 73 elided\\n\\nCaused by: com.microsoft.sqlserver.jdbc.SQLServerException: COPY statement input file schema discovery failed: Cannot bulk load. The file Storage logs for the corresponding failure may show "IPAuthorization" errors. This error is expected even if TEMP_FOLDER parameter is removed. Mitigation – Request you to set below spark config thru notebook/at pool level. spark.conf.set("spark.synapse.runAsMsi", "true")
Sign in to comment
Sort by: Most helpful
hi @Vinodh247 how can we check the temporary folders in adls. can you please help me with this.
i have checked the path. the Path is correct. But still not able to figure out the issue.
Can you confirm if you have provided your synapse workspace, the 'Storage Blob Data Contributor' role access? if you are able to run separately but the pipeline fails this should fix it. Try and and let me know
Pls accept and upvote the answer if you find this answer correct or it helped to fix your issue.
Just checking in to see if the above answer helped. Please do consider clicking Accept Answer and Up-Vote for the same as accepted answers help community as well.
Sign in to comment