Change_Data_field Issue

Rohit Kulkarni 676 Reputation points
2023-03-06T12:22:18.16+00:00

Hello Team,

I have enables change data field while writing a data into field. Once i create a table and load the data into delta table and next time i need to only insert those data which are changed in the existing table.So i have used the below formula :

Only changes the data below :

delta_df = spark.sql("SELECT * FROM table_changes('bronze.salesforce.lead_cdf', 1)")

display(delta_df)

df.write.mode("overwrite").format("delta").option("overwriteSchema", "true").saveAsTable("bronze.salesforce.lead_cdf")

Please advise

Regards

Rohit

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,910 questions
{count} votes

1 answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
    2023-03-07T22:19:00.76+00:00

    Hello @Rohit Kulkarni , Thanks for the question and using MS Q&A platform.
    I did tried the below code and it does the job .

    %python
    delta_df = spark.read.format("delta") \
              .option("readChangeFeed", "true") \
              .option("startingVersion", 9) \
              .option("endingVersion", 9) \
              .table("studentstest")
    #Get the most latest version , so doing max(_commit_version)
    delta_df =delta_df.groupBy("name", "address","student_id","_commit_version").max('_commit_version')
    delta_df = delta_df["name", "address","student_id"]
    
    #The the content back to the table 
    delta_df.write.option("header", "true").mode("overwrite").saveAsTable("studentstest")
    
    

    Himanshu

    Please accept as "Yes" if the answer provided is useful , so that you can help others in the community looking for remediation for similar issues.