parquet columns data types

Question

parquet columns data types

arkiboys 9,706

hello,
I am currently using ADF to import data into the azure storage in delta parquet files...
The columns in the parquet files are all as type string.
Now I would like to use the cast transformation to import the data into the existing parquet files but in the correct data type formats.

Question:

Is it possible to write the correct data types into the existing parquet files which have columns of type string?
How will the existing column type change in these files?

Thank you

Accepted answer

0 additional answers

Your answer

Answer 1

@arkiboys Hi again.

So if I understand you have an existing delta table, which means partitions or multiple parquet files.

Currently like:

Delta table schema: id->string, word->string, price->string

file1.parquet schema: id->string, word->string, price->string
file2.parquet schema: id->string, word->string, price->string

file3.parquet schema: id->string, word-> string, price->string
  existing rows with id->string, word->string, price->string
  add new rows with id->integer, word->string, price->decimal

add new file4.parquet with schema id->integer, word->string, price->decimal

in the above, file3 cannot happen because parquet files enforce their schema, and you can't mix and match different data types in a single column like that. Parquet files are more like sql tables than csv files.

file4, even if you somehow make it happen, will cause trouble because anything that queries the delta table expects a uniform schema among all files.

so, no you should redo the entire delta table to have the new schema. Then you can add things properly going forwards.

Share via

parquet columns data types

0 additional answers

Your answer