reading delta-lake

arkiboys 9,686 Reputation points

in databricks I get the data as follows:

column_1 _year _month _day


xxx 2022 11 14
yyy 2022 11 14
aaa 2022 11 15
ttt 2022 11 16

I would like to get the data into partitions in delta-lake as follows


to do this, I am using the following pyspark in databricks
folder_path = "/merged/presentation/delta/"
df_present.write.mode("overwrite").format("delta").partitionBy('_year', '_month', '_day').save(f"/mnt/storage_dev/curated{folder_path}")

so then I can query the data-lake as follows:

The problem I have is that the data is displayed for the latest _day whereas I want to see all data in _day=14 and _day=15 and ...
do you see what I am doing wrong in the pyspark?
thank you

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,076 questions
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,061 Reputation points

    Hello @arkiboys ,
    Thanks for the question and using MS Q&A platform.

    As I understand you want to read from a range of partitions, not just a single partition.

    I admit, delta is my weakness. From what I see, reading a single file like that is not necessary because of the table metadata. Due to "Partition discovery," we don't have to specify individual files, and can use a different method. All this is done for you if you access through the table rather than file directly. We can verify this using .explain at the end.


    spark.sql("select * from tablename where year = '2022' and month = '11' and day > 13")

    or"tablename").where("year = '2022' and month = '11' and day > 13")

    Excerpt from best-practices

    Load a single partition: Reading partitions directly is not necessary. For example, you don’t need to run"parquet").load("/data/date=2017-01-01"). Instead, use a WHERE clause for data skipping, such as"<table_name>").where("date = '2017-01-01'").

    Please do let me if you have any queries.


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

0 additional answers

Sort by: Most helpful