Hello @Huzaifa Tapal ,
Thanks for the question and using MS Q&A platform.
As we understand the ask here is loading data to a dataframe is better Vs adding a filter while reading the data , please do let us know if its not accurate.
Dataframe is a two dimensional data structure and in SPARK , when you create a DF , it is stored in memory . Now I persoanlly think and if you do not need all the data , we should add that to the dataframe . We have seen out of memory exception for bigger dataset but in small dataset you may be see much different .
So I will go ahead with adding the where filter .
Please do let me if you have any queries.
Thanks
Himanshu
- Please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
- Want a reminder to come back and check responses? Here is how to subscribe to a notification
- If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators