Azure Data Factory - Data Flow - Dynamic Csv to Parquet with data types - Culture is missing

Alessandro Ribul Moro 92 Reputation points
2022-07-12T17:55:51.773+00:00

Hello Microsoft Dev Team,
I want to request some feature :)
I need to have a Data Flow what is converting CSV to Parquet with Data Type conversion.
I see there is the option to define default format.
Here I miss the opportunity to pass a variable to the Data Flow with the culture or the format of the Numerical fraction.
I need to convert '9.179,92' to decimal number. 9179.92 is supported. In copy task there is an option to pass a culture but not in Data Flow.
Or just like for Date make a drop down for numeric values please
219978-image.png

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Alessandro Ribul Moro 92 Reputation points
    2022-08-01T14:24:37.26+00:00

    Input:
    226816-image.png
    Dataflow:
    226852-image.png

    Value expression:
    iif(isDouble($$, ''), iif(length(replace($$, '.', ''))==length(toString(toDouble($$, '', 'de'))), toString(toDouble($$, '', 'de')), toString(replace($$, ',', ''))), toString($$))

    Output:
    226797-image.png

    After this formatting was changed to US Format I use the CSV to Parquet Transformation with default Format detection.

    But anyway. I was asking about giving a culture to the "Data Flow Default Format" as it is using US-Format as default. Or the option to define the format somehow like the options for dates that are given.

    1 person found this answer helpful.

  2. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2022-07-14T06:13:54.323+00:00

    Hi @Anonymous ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your query.

    As I understand your ask here, you are trying to perform data migration from .csv file to .parquet format using DataFlow , you want to convert column value '9.179,92' to '9179.92'. Please let me know if my understanding is incorrect.

    In your query, are you mentioning about projection tab of dataflow, where there is an option to define default format. I can see we can select 'decimal' as the numerical fraction there. Kindly confirm if this is what you are looking for.

    For most dataset types, such as SQL and Parquet, the projection in a source is fixed to reflect the schema defined in a dataset. When your source files aren't strongly typed (for example, flat .csv files rather than Parquet files), you can define the data types for each field in the source transformation.

    For more information, kindly check Projection in DataFlow

    220538-image.png

    However, I don't think that's enough to achieve your requirement. You can try following approach to perform the transformation.

    1. Use dataset pointing to the Numeric data in the dataflow source transformation.
      220661-image.png
    2. Use Derived column transformation to remove the comma from the source data.
      220671-image.png
    3. Use derived column transformation to multiply the value with 1000 to convert it to desired format.
      220642-image.png
    4. Use parquet format dataset in sink transformation

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.