parquet columns data types

arkiboys 7,901 Reputation points

I am currently using ADF to import data into the azure storage in delta parquet files...
The columns in the parquet files are all as type string.
Now I would like to use the cast transformation to import the data into the existing parquet files but in the correct data type formats.


Is it possible to write the correct data types into the existing parquet files which have columns of type string?
How will the existing column type change in these files?

Thank you

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,343 questions
No comments
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 24,101 Reputation points Microsoft Employee

    @arkiboys Hi again.

    So if I understand you have an existing delta table, which means partitions or multiple parquet files.

    Currently like:

    Delta table schema: id->string, word->string, price->string
    file1.parquet schema: id->string, word->string, price->string
    file2.parquet schema: id->string, word->string, price->string
    file3.parquet schema: id->string, word-> string, price->string
      existing rows with id->string, word->string, price->string
      add new rows with id->integer, word->string, price->decimal
    add new file4.parquet with schema id->integer, word->string, price->decimal

    in the above, file3 cannot happen because parquet files enforce their schema, and you can't mix and match different data types in a single column like that. Parquet files are more like sql tables than csv files.

    file4, even if you somehow make it happen, will cause trouble because anything that queries the delta table expects a uniform schema among all files.

    so, no you should redo the entire delta table to have the new schema. Then you can add things properly going forwards.

    No comments

0 additional answers

Sort by: Most helpful