synapse pyspark write dataframe to sql azure

Question

I want to use Pyspark datawrite.write method to write dataframe to sql azure database with linked service. How can I achieve it?

Answer

Hi Simon Zhang,

Thanks for reaching out to Microsoft Q&A.

The below should work, let me know if you have any questions.

Pre-requisites:

Azure Synapse Analytics workspace with Apache Spark and Synapse SQL configured.
Azure SQL Database and a linked service set up in your Synapse workspace to connect to it.

Steps:

Create a Pyspark notebook in your Synapse workspace.
Generate or load your dataframe that you want to write to the SQL database.
Specify the connection details for the linked service using the spark.conf.set method:

spark.conf.set("spark.sql.parquet.writeLegacyFormat", "true")
spark.conf.set("spark.sql.hive.convertMetastoreParquet", "false")
spark.conf.set("spark.sql.hive.convertMetastoreOrc", "false")
spark.conf.set("spark.datasource.writeDataSourceOptions", "true")
spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

Write the dataframe to the SQL database using the write.format method:

df.write.format("com.databricks.spark.sqldw") \ .option("url", "jdbc:sqlserver://.database.windows.net:1433;database=") \ .option("dbtable", ".") \ .option("user", "") \ .option("password", "") \ .option("tempDir", "abfss://@.dfs.core.windows.net/temp") \ .mode("overwrite") \ .save()

5. Run the notebook to execute the write operation

You can monitor the write progress in the Synapse Studio by going to the "Monitor" hub and checking the "Pipeline runs" or "Spark job" details.

Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

Share via

synapse pyspark write dataframe to sql azure

1 answer