question

arkiboys avatar image
0 Votes"
arkiboys asked ShaikMaheer-MSFT edited

error reading delta parquet with new field added

Hello,
loading daily delta parquet files into day folders each day
i.e.
...
/year=2022/month=05/day=09
/year=2022/month=05/day=10

today I added one more column to the load and so in day=11 the new field should be present

This is what I use to read delta parquet files in each day folder.
It works fine for any previous day except today and I suspect it is to do with the new field inside the load for today?
Do you know how to solve this?

delta_split_delivery_folder_path = "/prints/dloads/*"

df_delta_split = spark.read.parquet(f"abfss://{marketing_container_name}@{storage_account_name}.dfs.core.windows.net{delta_split_delivery_folder_path}")

yearNo=2022
monthNo=05

day=11 gives the error as you see below but for any other previous days it works fine

dayNo=11
df_today = df_delta_split.filter("year=" + str(yearNo) + " and month=" + str(monthNo) + " and day =" + str(dayNo))

display(df_today)

the display gives this error:

UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary

azure-databricks
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ShaikMaheer-MSFT avatar image
0 Votes"
ShaikMaheer-MSFT answered ShaikMaheer-MSFT edited

Hi @arkiboys,

Thanks for posting query in Microsoft Q&A Platform.

Could you please try using StringType When writing to parquet and see if that helps?

Similar error was discussed in below post. Where user confirms that writing data as StringType helps to resolve issue. Please let us know how it goes. Thank you.
https://stackoverflow.com/questions/41133327/spark-error-reading-datetype-columns-in-partitioned-parquet-data

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

how do you mean please?

0 Votes 0 ·

Apologies for missing posting reference link. Edited above response. Kindly check it and see if that helps.

0 Votes 0 ·