I want to convert CSV gzipped files to parquet which are stored in blob storage & store output back to blob container

Sudhindra 0 Reputation points
2023-03-28T12:01:23.4366667+00:00

Hi

I have a requirement where I want to convert CSV files to parquet files. Input CSV files are placed in one container of blob storage in below format

Blob input container path : input/<current_date>/file1.csv.gz

File Header : date,id,name,json_data

I want to take all files from input/<current_date>/* & convert them to parquet files & store them in a format like below. In this id/date should come from file & files should get segregated accordingly:

Blob output container path : output/<id>/<date>/file1.parquet

Sample data

date,id,name,json_data

2023-03-11,123-acba,john,{"attr":"test1"}

2023-03-11,123-abba,john,{"attr":"test3"}

2023-03-12,123-acba,joe,{"attr":"test2"}

based on data (date/id) respective data files should go to particular folder

How do I achieve this through data factory. I want to run this activity once a day for all the files available for a given date.

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,203 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,665 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,491 Reputation points Microsoft Employee Moderator
    2023-03-30T20:17:53.48+00:00

    Hello @Sudhindra , Thanks for the question and using MS Q&A platform.

    When you say you want to use Azure data factory are you OK using Mapping data flow? The reason I am asking is as you can see there is a transformation which is required here and mapping data flow is basically designed for transformation. In ADF we will be using the copy activity and it does have a very limited transformation feature.

    More on Mapping data flow: https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-overview

    Thanks Himanshu

    Please accept as "Yes" if the answer provided is useful , so that you can help others in the community looking for remediation for similar issues. 


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.