Databricks, How to update value of a column with MAP data-type in a delta table using a python dictionary and SQL UPDATE command?

Mohammad Saber 591 Reputation points
2023-03-26T05:00:04.1533333+00:00

I have a delta table in Databricks created by:

%sql
 
CREATE TABLE IF NOT EXISTS dev.bronze.test_map (
    id INT,
    table_updates MAP<STRING, TIMESTAMP>,
 
  CONSTRAINT test_map_pk PRIMARY KEY(id)
  
  ) USING DELTA
LOCATION "abfss://bronze@Table Path"

With initial values:

INSERT INTO dev.bronze.test_map 
VALUES (1, null), 
       (2, null), 
       (3, null);

Note that there is no value in column "table_updates".

enter image description here

After processing other tables in our platform, I have table updates info as a python dictionary like below:

table_updates_id1  =  
 
{'id1_table_1': datetime.datetime(2023, 3, 26, 4, 33, 22, 323000),
 
 'id1_table_2': datetime.datetime(2023, 3, 26, 4, 33, 22, 323000)}

User's image

Now, I want to update the value of column "table_update" where id=1 using "SQL UPDATE command" (note that I want to update the table not dataframe).

I tried different methods but failed.

1st trial:

spark.sql(f"""
UPDATE dev.bronze.test_map
SET
    table_updates = map({table_updates_id1})
WHERE
    id = 1
""")

Error:

enter image description here

2nd trial:

spark.sql(f"""
UPDATE dev.bronze.test_map
SET
    table_updates = map('{','.join([f'{k},{v}' for k,v in table_updates_id1.items()])}')
WHERE
    id = 1
""")

enter image description here

Any idea how to solve this issue? Thanks.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,163 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 32,816 Reputation points Microsoft Employee
    2023-03-28T11:11:49.91+00:00

    @Mohammad Saber ,

    Thankyou for using Microsoft Q&A platform and posting your question here. As I understand your issue, you are trying to update the column of delta table but it's giving syntax error. Please let me know if that is not the case here.

    It looks like you have defined the variable as a python dictionary and then trying to UPDATE the delta table using SQL command, which is why it's not able to interpret dictionary datatype in sql query and due to conflict, it's throwing syntax error.

    My recommendation would be to use python for updating the delta table as well. For instance:

    from delta.tables import *
    from pyspark.sql.functions import *
    
    deltaTable = DeltaTable.forPath(spark, 'abfss://bronze@Table Path')
    
    # Declare the predicate by using a SQL-formatted string.
    table_updates_id1  =  
    {'id1_table_1': datetime.datetime(2023, 3, 26, 4, 33, 22, 323000),
     'id1_table_2': datetime.datetime(2023, 3, 26, 4, 33, 22, 323000)}
    
    deltaTable.update(
      condition = "id= 1",
      set = { "table_updates": "table_updates_id1" }
    )
    
    

    For more details, kindly check out this documentation: Table deletes, updates, and merges


    Hope it helps. Kindly accept the answer by clicking on Accept answer button.

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Sedat SALMAN 13,740 Reputation points
    2023-03-26T05:28:35.6066667+00:00

    use the map_from_arrays function to update the table_updates column

    you can use the following python or similar

    
    keys = ', '.join([f"'{k}'" for k in table_updates_id1.keys()])
    values = ', '.join([f"CAST('{v}' AS TIMESTAMP)" for v in table_updates_id1.values()])
    
    update_query = f"""
    UPDATE dev.bronze.test_map
    SET
        table_updates = map_from_arrays(ARRAY[{keys}], ARRAY[{values}])
    WHERE
        id = 1
    """
    
    spark.sql(update_query)
    
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.