Data type conversion in Azure Data Factory - csv files

M, Murugeswari (Cognizant) 456 Reputation points
2022-11-28T04:18:44.18+00:00

Hi

We got a requirement to migrate nearly 200 tables to Azure Synapse. The tables are available as csv files in Azure Blob storage

We planned to load the tables (autogenerate) in batch (10 tables at a time -parameterized pipeline & datasets) using metadata table. Here the known problem is all columns in CSV file will be of string type. Since we are loading in batches, we cant manually change the datatype using dataflows also. So Is there any other option available or we need to create tables manually only in synapse and then load

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,131 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. AnnuKumari-MSFT 32,011 Reputation points Microsoft Employee
    2022-11-29T05:46:39.403+00:00

    Hi @Anonymous ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question .

    As I understand your query, you are trying to copy data from ADLS to Azure synapse tables using copy activity but the issue is datatype of all csv columns are string .

    I assume that you are treating ADLS as the intermediate storage while loading the data from some source like : Azure sql, salesforce, etc? If so, kindly share what is the original source datastore?

    Unlike csv files, parquet files are known to preserve the schema of the dataset, that means , it remember what columns are string, numeric etc., so when you re-load your data , from parquet to Azure synapse sql , you can be assured it will look the same as it did when you saved it.

    You can use copy data activity to load from parquet to synapse using auto create table.

    Note: Using auto create table while loading from parquet to synapse will surely preserve the data type for all columns, however, for varchar and nvarchar columns the default size would be max. If you want to preserve the size as well, you can go for creating external tables from parquet to synapse then loading data from external table to physical table.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

  2. M, Murugeswari (Cognizant) 456 Reputation points
    2022-11-30T06:34:04.793+00:00

    Hi @AnnuKumari-MSFT ,

    Thanks for your suggestion. The original source data source is Netezza. Due to security restrictions, we are not allowed to connect to netezza database via ADF.

    So the corresponding team have planned to extract the tables in csv format and they will upload in Azure blob storage. Then using ADF, we planned to load the data to synapse

    As per your suggestion, whether can I ask the corresponding team to extract the tables in parquet format. Whether it is possible in netezza database

    And another requirement also. We need to perform row level and file level MD5 validation. Is it possible with Parquet files

    Thanks

    0 comments No comments