Hello,
loading daily delta parquet files into day folders each day
i.e.
...
/year=2022/month=05/day=09
/year=2022/month=05/day=10
today I added one more column to the load and so in day=11 the new field should be present
This is what I use to read delta parquet files in each day folder.
It works fine for any previous day except today and I suspect it is to do with the new field inside the load for today?
Do you know how to solve this?
delta_split_delivery_folder_path = "/prints/dloads/*"
df_delta_split = spark.read.parquet(f"abfss://{marketing_container_name}@{storage_account_name}.dfs.core.windows.net{delta_split_delivery_folder_path}")
yearNo=2022
monthNo=05
day=11 gives the error as you see below but for any other previous days it works fine
dayNo=11
df_today = df_delta_split.filter("year=" + str(yearNo) + " and month=" + str(monthNo) + " and day =" + str(dayNo))
display(df_today)
the display gives this error:
UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary