question

GhoshSourav893-1207 avatar image
0 Votes"
GhoshSourav893-1207 asked MartinJaffer-MSFT commented

Unable to fetch large data using DatabricksJDBC42-2.6.25.1044 jar

I'm trying to read data using DatabricksJDBC42-2.6.25.1044 jar. it's working perfectly fine with small datasets , however it's failing with the below error for larger datasets. I can observe the failure as soon as the table read completes at databricks cluster (DBR 9.1) side.

Note: Couldn't find relevant documentation related to this error and documents related to JDBC runtime parameters, like fetchSize, queryTimeout and so on. .

Exception in thread "main" java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500638) The file <Blob storage URL> has not been downloaded successfully and the driver will not retry due to exceeding of the max retry limit 10, you can increase the max retry limit by setting MaxConsecutiveResultFileDownloadRetries.
at com.databricks.client.spark.jdbc.ResultFileDownloadManager.checkAndHandleDownloadError(Unknown Source)
at com.databricks.client.spark.jdbc.ResultFileDownloadManager.getNextDownloadedFile(Unknown Source)
at com.databricks.client.spark.jdbc.DowloadableFetchClient.fetchNRows(Unknown Source)
at com.databricks.client.hivecommon.api.HS2Client.fetchRows(Unknown Source)
at com.databricks.client.hivecommon.dataengine.BackgroundFetcher.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Caused by: com.databricks.client.support.exceptions.GeneralException: [Databricks][DatabricksJDBCDriver](500638) The file <Blob storage URL> has not been downloaded successfully and the driver will not retry due to exceeding of the max retry limit 10, you can increase the max retry limit by setting MaxConsecutiveResultFileDownloadRetries.
... 9 more

azure-databricks
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @GhoshSourav893-1207,
Thanks for the question and using MS Q&A platform.

As I understand, the ask is help with the cause of this error message, and request for any related documentation.

First I will try to dissect this error message.

The reader tried to read a blob, but failed. It tried again and again until it hit the set limit on retries.
Given that your process succeeded in smaller files, but not this large one, I speculate that the reader hit a time limit on how long a read can go before being deemed a failure. (This is to stop cases where an operation never finishes, and the computer hangs.)
There is also the possibility that something else caused the read to fail each time. To determine whether the cause is a timeout or something else, I would need more information. Logs and/or the duration of the operation would help.

Could you please give a little more context? I am not an expert in JDBC, but this sounds like this might be Cloud Fetch?

As I understand, the Cloud Fetch uses the JDBC driver to make the database write the query results to a blob file. Then it gives a token to Databricks so Databricks can download the result. This last part sounds like the failure. What I would do, is look around inside the DBFS for the results file, and inspect it manually. Trying to open the results, outside the context of Cloud Fetch would reveal the cause of the error. Maybe permission? Maybe it wasn't written?


0 Votes 0 ·

0 Answers