How to set parquet data types in copy activity sink?

joba-2596 5 Reputation points
2023-01-30T07:59:44.62+00:00

I'm trying to copy csv to parquet files using copy activity. However, I'm not succeeding in writing typed data to parquet. Whatever I do, all the fields in the parquet end up as "BYTE_ARRAY" (string):

Parquet schema viewer

The mapping I'm currently using looks like this:

"mapping": {
    "type": "TabularTranslator",
    "mappings": [           
        {
            "source": {
                "name": "REQUEST_ID",
                "type": "String",
                "physicalType": "String"
            },
            "sink": {
                "name": "request_id",
                "type": "String",
                "physicalType": "UTF8"
            }
        },
        [...]
        {
            "source": {
                "name": "t_job_id",
                "type": "INT32",
                "physicalType": "String"
            },
            "sink": {
                "name": "t_job_id",
                "type": "Int32",
                "physicalType": "INT_32"
            }
        },
        [...]
    ]
}

What am I doing wrong? The jobid and fileid fields are "additional columns" added by the copy activity, but they are definetley valid integers

User's image

The warnings in the additional columns say: Expression of type 'Int' does not match the field: 'value' but I'm not sure what that means. I can cast them to Strings in the expression to make the warning go away, but that doesn't solve the parquet problem either.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,199 questions
{count} vote