Error in generating Avro or Parquet file from CSV using copy activity with dynamic mapping

Vamshi Krishna Komuravelli 0 Reputation points
2023-10-02T10:06:33.56+00:00

Hi,

Requirements is to copy data from multiple sources(SQL/API/CSV/TXT/JSON/ETC) to ADLS avro/parquet format. but some of the sources systems including CSV/TXT Files have column name with spaces. when using copy activity to convert these files to Parquert/Avro with manual mapping it is working fine and sink file is generated, when we use dynamic mapping, it is failing with column name in sink do not accept special characters.

along with it there is another issue with Json multi level, static mapping generated the flattern structure for multi level , but same mapping passed through dynamic mapping, it is giving one level only.

Is there any solution in recent releases?

Error: ErrorCode=ParquetInvalidColumnName,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The column name is invalid. Column name cannot contain these character:[,;{}()\n\t=],Source=Microsoft.DataTransfer.Common,'

Source CSV file:

container id,conataimer name,location

1,test,hyd

2,tes1,blr

3,test2,ncr

Dynamic Mapping Content:

{
                        "type": "TabularTranslator",
                        "mappings": [
                            {
                                "source": {
                                    "name": "container id",
                                    "type": "String",
                                    "physicalType": "String"
                                },
                                "sink": {
                                    "name": "container_id",
                                    "physicalType": "String"
                                }
                            },
                            {
                                "source": {
                                    "name": "conataimer name",
                                    "type": "String",
                                    "physicalType": "String"
                                },
                                "sink": {
                                    "name": "conataimer_name",
                                    "physicalType": "String"
                                }
                            },
                            {
                                "source": {
                                    "name": "location",
                                    "type": "String",
                                    "physicalType": "String"
                                },
                                "sink": {
                                    "name": "location",
                                    "type": "String",
                                    "physicalType": "String"
                                }
                            }
                        ],
                        "typeConversion": true,
                        "typeConversionSettings": {
                            "allowDataTruncation": true,
                            "treatBooleanAsNumber": false
                        }
                    }

Regards,

Vamshi K

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,429 questions
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,627 Reputation points Microsoft Employee
    2023-10-02T23:44:31.46+00:00

    @Anonymous Thanks for using Microsoft Q&A forum and posting your query.

    I see 2 questions being asked in this thread:

    1. Problem 1: Column name issue with auto/default mapping from CSV to parquet/AVRO format
    2. Problem 2: JSON multi-level source not being flattened with dynamic mapping.

    To keep it clean and clear for other reader of the community let's focus on problem 1 in this thread and I would request you to please open a new thread for problem 2 as both are different contexts. Just wanted to keep the issue distinct from each other and avoid confusion to the community :).

    Regarding Problem 1, yes, it is an expected behavior when there are special characters or space in the column names, when you do explicit mapping, it succeeds as you are explicitly stating the mapping of the source to sink columns. But when you try to do auto/default mapping, ADF throws an error as it is unable to correlate the source to sink column names. For default mapping, copy activity maps source data to sink by column names in case-sensitive manner. Since there is a change in column names (space/special characters) of your source, it is throwing the error.

    To overcome this issue, you will have to go with parameterized mapping (where you predefine the mapping json and pass it to copy activity dynamically at runtime) or dynamic mapping as explained in the below blog post a community volunteer.

    Please refer to below for details about the implementation of the workarounds:

    1. Parameterized Column mapping in Copy Activity in Azure Data Factory
    2. Blog post: How to: Copy delimited files having column names with spaces in parquet format using ADF - This blog post explained exactly the same problem you are facing and a possible solution on how to over come it.
    3. Another blog post: Export Parquet Files with Column Names with Spaces

    In addition, a note has been called out in the public documentation (Parquet format) that "White space in column name is not supported for Parquet files."User's image

    Hope this info helps.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.