Change_Data_field Issue

Question

Hello Team,

I have enables change data field while writing a data into field. Once i create a table and load the data into delta table and next time i need to only insert those data which are changed in the existing table.So i have used the below formula :

Only changes the data below :

delta_df = spark.sql("SELECT * FROM table_changes('bronze.salesforce.lead_cdf', 1)")

display(delta_df)

df.write.mode("overwrite").format("delta").option("overwriteSchema", "true").saveAsTable("bronze.salesforce.lead_cdf")

Please advise

Regards

Rohit

Answer

Hello @Rohit Kulkarni , Thanks for the question and using MS Q&A platform.
I did tried the below code and it does the job .

%python
delta_df = spark.read.format("delta") \
          .option("readChangeFeed", "true") \
          .option("startingVersion", 9) \
          .option("endingVersion", 9) \
          .table("studentstest")
#Get the most latest version , so doing max(_commit_version)
delta_df =delta_df.groupBy("name", "address","student_id","_commit_version").max('_commit_version')
delta_df = delta_df["name", "address","student_id"]

#The the content back to the table 
delta_df.write.option("header", "true").mode("overwrite").saveAsTable("studentstest")

Himanshu

Please accept as "Yes" if the answer provided is useful , so that you can help others in the community looking for remediation for similar issues.

Share via

Change_Data_field Issue

1 answer