Your issue could be due to several factors.
First thing I think about is newer runtime versions might introduce changes or optimizations. So ypu may need to review and potentially optimize your cluster configuration to ensure it has sufficient resources for the 950 GB, 9 billion row dataset.
Check for any specific performance tuning parameters for Delta Lake or Spark SQL that might help. If you didn't enable any detailed logging and monitoring, I think you should at least to capture more information about the write operation and identify performance bottlenecks.
Review your dataframe transformations and actions for efficiency, and verify that your clustering strategy is appropriate for your query patterns.
Steps to you can follow to diagnose :
- Check the Databricks cluster logs for any error messages or warnings related to the write operation. Look for any exceptions or stack traces that might indicate the cause of the problem.
- Use the Spark SQL
EXPLAIN
command to generate the execution plan for your write operation. This can help you understand how Spark is processing the write and identify any potential bottlenecks. - Use Databricks' built-in performance profiling tools to analyze the execution of your job. This can provide insights into where the time is being spent during the write operation.