Synapse Workspace Notebook connection to SQL Pool Dedicated

Waleed Saleem 66 Reputation points
2024-03-06T16:37:02.7933333+00:00

Dear Support Team,

I am currently utilizing an Azure Spark environment connected to a dedicated SQL pool. My objective is to execute an SQL update on a table within the database via a notebook.

So far, I have successfully inserted data into the database using the spark.sql API to perform a data insert with df.write.mode('append').synapsesql(db). However, now I wish to update a single column in an existing table.

Could you please provide me with guidance or examples on how to execute an SQL update on a table in my Azure Synapse database via my Spark notebook?

Thank you in advance for your assistance.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
0 comments No comments
{count} votes

Accepted answer
  1. Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator
    2024-03-07T07:22:18.7366667+00:00

    Hi @Waleed Saleem

    Thank you for reaching out to the Azure community forum with your query.

    I think it is not possible to update a column in a dedicated SQL pool table using the spark.sql API in a Synapse workspace notebook.

    To update a column in a dedicated SQL pool table, you can use the pyspark package to read the data from the table into a Spark DataFrame, update the column in the DataFrame, and then write the updated data back to the table using the synapsesql API. Here's an example code snippet that you can use:

    from pyspark.sql import SparkSession
    
    # Create a SparkSession
    spark = SparkSession.builder.appName("UpdateTable").getOrCreate()
    
    # Read data from the table into a Spark DataFrame
    df = spark.read.synapsesql("<database>.<schema>.<table>")
    
    # Update the column in the DataFrame
    df = df.withColumn("<column_name>", <new_value>)
    
    # Write the updated data back to the table
    df.write.synapsesql("<database>.<schema>.<table>", mode="overwrite")
    
    

    Replace <database>, <schema>, <table>, <column_name>, and <new_value> with the actual database name, schema name, table name, column name, and new value, respectively.

    However, please note that this solution may depend on your specific scenario and may not work in all cases.

    Reference: Read and write data in Azure Synapse Analytics using Apache Spark

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.