Change_Data_field Issue

Rohit Kulkarni 731 Reputation points
2023-03-06T12:22:18.16+00:00

Hello Team,

I have enables change data field while writing a data into field. Once i create a table and load the data into delta table and next time i need to only insert those data which are changed in the existing table.So i have used the below formula :

Only changes the data below :

delta_df = spark.sql("SELECT * FROM table_changes('bronze.salesforce.lead_cdf', 1)")

display(delta_df)

df.write.mode("overwrite").format("delta").option("overwriteSchema", "true").saveAsTable("bronze.salesforce.lead_cdf")

Please advise

Regards

Rohit

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,478 questions
{count} votes

1 answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,486 Reputation points Microsoft Employee Moderator
    2023-03-07T22:19:00.76+00:00

    Hello @Rohit Kulkarni , Thanks for the question and using MS Q&A platform.
    I did tried the below code and it does the job .

    %python
    delta_df = spark.read.format("delta") \
              .option("readChangeFeed", "true") \
              .option("startingVersion", 9) \
              .option("endingVersion", 9) \
              .table("studentstest")
    #Get the most latest version , so doing max(_commit_version)
    delta_df =delta_df.groupBy("name", "address","student_id","_commit_version").max('_commit_version')
    delta_df = delta_df["name", "address","student_id"]
    
    #The the content back to the table 
    delta_df.write.option("header", "true").mode("overwrite").saveAsTable("studentstest")
    
    

    Himanshu

    Please accept as "Yes" if the answer provided is useful , so that you can help others in the community looking for remediation for similar issues. 


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.