Hello @arkiboys ,
Thanks for the question and using MS Q&A platform.
As I understand you want to read from a range of partitions, not just a single partition.
I admit, delta is my weakness. From what I see, reading a single file like that is not necessary because of the table metadata. Due to "Partition discovery," we don't have to specify individual files, and can use a different method. All this is done for you if you access through the table rather than file directly. We can verify this using .explain
at the end.
try
spark.sql("select * from tablename where year = '2022' and month = '11' and day > 13")
or
spark.read.table("tablename").where("year = '2022' and month = '11' and day > 13")
Excerpt from best-practices
Load a single partition: Reading partitions directly is not necessary. For example, you don’t need to run spark.read.format("parquet").load("/data/date=2017-01-01"). Instead, use a WHERE clause for data skipping, such as spark.read.table("<table_name>").where("date = '2017-01-01'").
Please do let me if you have any queries.
Thanks
Martin
- Please don't forget to click on
or upvote
button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
- Want a reminder to come back and check responses? Here is how to subscribe to a notification
- If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators