Hello @Ra Gus ,
Thanks for the question and using MS Q&A platform.
As we understand the ask here is If spark push down the filter the engine should read only 1 M rows , please do let us know if its not accurate.
Spark is smart enough to the determine and send the data for each partion to each executers ( if they exists ) . In this sceanrio I am sure that the 1 M record is what is sent to the executers . Spark also does something called lazy evaluation which means that if you put multiple filters ( internally its a where clause ) , it waits till the last minute and creates a plan to takes care of all all the filter and execute only once , this makes the whole operation effecient . I think this is also refred to as push down predicate .
Please do let me if you have any queries.
Thanks
Himanshu
- Please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
- Want a reminder to come back and check responses? Here is how to subscribe to a notification
- If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators