Results too large error

Sujal Mandal 0 Reputation points
2024-05-21T12:31:02.8266667+00:00

Hi,

We have a databricks table for which the underlying data is in ADLS gen2. The table has a column named "data" (stringtype) which has very large JSON values. When we try to select the rows from the table it throws an error as "Results too large" whereas if we select all other fields except the data field, the query runs fine and the result is displayed. The min and max characters in the field data is 1998447 and 10029361 respectively. Please let us know if there is a solution for this. We have tried increasing the spark.driver.maxResultSize to 8g at cluster level but it did not help.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,262 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 90,261 Reputation points
    2024-05-22T04:53:31.9733333+00:00

    @Mandal, Sujalkumar (Cognizant) - Thanks for the question and using MS Q&A platform.

    The "Results too large" error occurs when the result set of a query exceeds the maximum size that can be returned to the driver node. This error can occur when you try to select a column with very large values, such as your "data" column.

    One solution to this issue is to use the spark.sql.shuffle.partitions configuration to increase the number of partitions used in the query. This can help to distribute the data across multiple nodes and reduce the size of the result set returned to the driver node. You can try setting this configuration to a higher value, such as 1000, and see if it helps.

    Another solution is to use the spark.sql.broadcastTimeout configuration to increase the timeout for broadcasting large tables. This can help to ensure that the data is broadcasted to all nodes in the cluster before the query is executed. You can try setting this configuration to a higher value, such as 1000, and see if it helps.

    If these solutions do not work, you may need to consider restructuring your data to reduce the size of the "data" column or splitting the query into smaller parts.

    I hope this helps! Let me know if you have any other questions.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.