Hello @Raj0125 ,
Thanks for the ask and using Microsoft Q&A platform .
As I understand the ask here is to find the records which are not there in the latest file . Please do let me know if thats not correct .
We can use the subtract API to achieve this . The idea is to load the records from two different days in two different dataframes and then compare them .
In my example I have taken some dummy data ( in your case you will have to load the data from the paraquet file into the dataframe df1 and df2 )
import pandas as pd
import numpy as np
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data1 = [("James","","Smith","36636","M",3000),
("Michael","Rose","","40288","M",4000),
("Robert","","Williams","42114","M",4000),
("Maria","Anne","Jones","39192","F",4000),
("Jen","Mary","Brown","","F",-1)
]
data2 = [("James","","Smith","36636","M",3000),
("Michael","Rose","","40288","M",4000),
("Robert","","Williams","42114","M",4000),
("Maria","Anne","Jones","39192","F",4000),
("Jen","Mary","Brown","","F",-1),
("Himanshu","XXXX","YYYYY","","M",-1) ]
schema = StructType([ \
StructField("firstname",StringType(),True), \
StructField("middlename",StringType(),True), \
StructField("lastname",StringType(),True), \
StructField("id", StringType(), True), \
StructField("gender", StringType(), True), \
StructField("salary", IntegerType(), True) \
])
df2 = spark.createDataFrame(data=data2,schema=schema)
df1 = spark.createDataFrame(data=data1,schema=schema)
finaldf = df2.subtract(df1)
display(finaldf)
Output
Please do let me know how it goes .
Thanks
Himanshu
-------------------------------------------------------------------------------------------------------------------------
- Please don't forget to click on
or upvote
button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
- Want a reminder to come back and check responses? Here is how to subscribe to a notification
- If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators