synapse pyspark write dataframe to sql azure

Simon Zhang 6 Reputation points Microsoft Employee
2024-07-26T23:35:52.9533333+00:00

I want to use Pyspark datawrite.write method to write dataframe to sql azure database with linked service. How can I achieve it?

Azure SQL Database
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,696 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 13,801 Reputation points
    2024-07-27T11:26:05.6+00:00

    Hi Simon Zhang,

    Thanks for reaching out to Microsoft Q&A.

    The below should work, let me know if you have any questions.

    Pre-requisites:

    • Azure Synapse Analytics workspace with Apache Spark and Synapse SQL configured.
    • Azure SQL Database and a linked service set up in your Synapse workspace to connect to it.

    Steps:

    1. Create a Pyspark notebook in your Synapse workspace.
    2. Generate or load your dataframe that you want to write to the SQL database.
    3. Specify the connection details for the linked service using the spark.conf.set method:
    spark.conf.set("spark.sql.parquet.writeLegacyFormat", "true")
    spark.conf.set("spark.sql.hive.convertMetastoreParquet", "false")
    spark.conf.set("spark.sql.hive.convertMetastoreOrc", "false")
    spark.conf.set("spark.datasource.writeDataSourceOptions", "true")
    spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")
    
    
    1. Write the dataframe to the SQL database using the write.format method:
    df.write.format("com.databricks.spark.sqldw") \ .option("url", "jdbc:sqlserver://<server_name>.database.windows.net:1433;database=<database_name>") \ .option("dbtable", "<schema>.<table_name>") \ .option("user", "<username>") \ .option("password", "<password>") \ .option("tempDir", "abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/temp") \ .mode("overwrite") \ .save()
    

    5. Run the notebook to execute the write operation

    You can monitor the write progress in the Synapse Studio by going to the "Monitor" hub and checking the "Pipeline runs" or "Spark job" details.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    0 comments No comments