How writing partitioned parquet files from a csv file

pmscorca 792 Reputation points
2023-03-24T22:21:07.8466667+00:00

Hi,

I need to know how I can write partitioned parquet files into an Azure Blob storage reading a single csv file.

I'd like to partition by year, creating year folders and putting the year partitioned files in the corresponding folder.

Any helps to me, please? Thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,545 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247-1375 11,126 Reputation points
    2023-03-25T10:33:29.4233333+00:00

    Hi

    Thanks for reaching out to Microsoft Q&A.

    Yes you can achieve it through ConditionalSplit transformation activity in adf data flow. Pls refer the below link, you will have to alter the conditions little bit to suit your requirement though.

    https://stackoverflow.com/questions/71174070/copying-single-csv-with-multiple-schema-in-adf

    Please Upvote and Accept as answer if the reply was helpful, this will be helpful to other community members.


  2. ShaikMaheer-MSFT 37,896 Reputation points Microsoft Employee
    2023-03-27T10:00:55.12+00:00

    Hi pmscorca,

    Thank you for posting query in Microsoft Q&A Platform.

    You can achieve this, using data flows in Azure data factory or writing PySpark code in Azure Databricks or Azure Synapse analytics notebooks.

    Below screenshots shows sample implementation details of similar case using mapping data flows. Here I am taking source data and partitioning based on dep column. Hence you can see folders and subfolders created with dep. In your case you can consider using year column.

    User's image

    You can achieve same using PySpark code partitionBy() function as well. Please check below video for same created by me.

    partitionBy function in PySpark

    Hope this helps. Please let me know how it goes or if any further queries.


    Please consider hitting Accept Answer button. Accepted answers help community as well.