@Mandal, Sujalkumar (Cognizant) - Thanks for the question and using MS Q&A platform.
The "Results too large" error occurs when the result set of a query exceeds the maximum size that can be returned to the driver node. This error can occur when you try to select a column with very large values, such as your "data" column.
One solution to this issue is to use the spark.sql.shuffle.partitions
configuration to increase the number of partitions used in the query. This can help to distribute the data across multiple nodes and reduce the size of the result set returned to the driver node. You can try setting this configuration to a higher value, such as 1000, and see if it helps.
Another solution is to use the spark.sql.broadcastTimeout
configuration to increase the timeout for broadcasting large tables. This can help to ensure that the data is broadcasted to all nodes in the cluster before the query is executed. You can try setting this configuration to a higher value, such as 1000, and see if it helps.
If these solutions do not work, you may need to consider restructuring your data to reduce the size of the "data" column or splitting the query into smaller parts.
I hope this helps! Let me know if you have any other questions.