Hi Simon Zhang,
Thanks for reaching out to Microsoft Q&A.
The below should work, let me know if you have any questions.
Pre-requisites:
- Azure Synapse Analytics workspace with Apache Spark and Synapse SQL configured.
- Azure SQL Database and a linked service set up in your Synapse workspace to connect to it.
Steps:
- Create a Pyspark notebook in your Synapse workspace.
- Generate or load your dataframe that you want to write to the SQL database.
- Specify the connection details for the linked service using the
spark.conf.set
method:
spark.conf.set("spark.sql.parquet.writeLegacyFormat", "true")
spark.conf.set("spark.sql.hive.convertMetastoreParquet", "false")
spark.conf.set("spark.sql.hive.convertMetastoreOrc", "false")
spark.conf.set("spark.datasource.writeDataSourceOptions", "true")
spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")
- Write the dataframe to the SQL database using the
write.format
method:
df.write.format("com.databricks.spark.sqldw") \ .option("url", "jdbc:sqlserver://<server_name>.database.windows.net:1433;database=<database_name>") \ .option("dbtable", "<schema>.<table_name>") \ .option("user", "<username>") \ .option("password", "<password>") \ .option("tempDir", "abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/temp") \ .mode("overwrite") \ .save()
5. Run the notebook to execute the write operation
You can monitor the write progress in the Synapse Studio by going to the "Monitor" hub and checking the "Pipeline runs" or "Spark job" details.
Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.