how to improve the write time to a synapse table with azure databricks

Question

I am trying to save a pyspark dataframe with millions of rows and 110 columns to a synapse table and it takes a long time, I would like to know some ways in which I can shorten the time, searching a bit I found that enable this configuration
spark.conf.set("spark.databricks.sqldw.writeSemantics", "copy")

But I would like to know if there are other options to improve the writing time

Answer

Hello @Luis Cespedes ,
Thanks for the ask and using Microsoft Q&A platform .
As we understand the ask here is how ti improve the writing to Synapse . Please do let me know if that not accurate
When you say Synapse I am assming that you are refering to dedicated SQL .I am nots uere what is distibution whcih is set on the sink table . It may ve worth exploring . You can read more on the same here .
Adding some text from the above mentione link for reference .

What is a distributed table?

A distributed table appears as a single table, but the rows are actually stored across 60 distributions. The rows are distributed with a hash or round-robin algorithm.

Hash-distribution improves query performance on large fact tables, and is the focus of this article. Round-robin distribution is useful for improving loading speed. These design choices have a significant impact on improving query and loading performance.

Another table storage option is to replicate a small table across all the Compute nodes. For more information, see Design guidance for replicated tables. To quickly choose among the three options, see Distributed tables in the tables overview.

As part of table design, understand as much as possible about your data and how the data is queried. For example, consider these questions:

How large is the table?
How often is the table refreshed?
Do I have fact and dimension tables in a dedicated SQL pool?

Please do let me if you have any queries .
Thanks
Himanshu

-------------------------------------------------------------------------------------------------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Share via

how to improve the write time to a synapse table with azure databricks

1 answer