How to combine some files automatically in Azure Synapse Analytics.

Kakehi Shunya (筧 隼弥) 201 Reputation points
2022-06-22T09:03:36.28+00:00

I am working as a data engineer and I have to combine some files into one file every day.

Here is what I would like to do:

  1. Upload file(.gz) to Azure Blob Storage every day.
  2. Unzip the file to parquet format.
  3. Combine some files into one file (partition by month).

For example,

  1. Upload file (file-path : Sample/year=2022/month=06/day=22/sample.gz) to Blob.
    Already have been uploaded past files(year=2022/month=06/day=1-21/sample.gz) to the same directory. And those files have been combined into one file(202206.parquet)
  2. Unzip the file(2022/06/22) to parquet format.
  3. Combine 2022/06/22 file with 202206.parquet file.
    **After combining, if possible, delete 202206.parquet and create a new file with 2022/06/22 file's data.

Now, I already have created pipelines in the part of step1 and step 2.
So, I need your advice focus on step3.
Any help would be greatly appreciated.

postscript : .gz file have about 25 csv file

Thank you for reading my question with poor English.

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,441 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,405 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,621 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Nandan Hegde 29,896 Reputation points MVP
    2022-06-22T09:19:20.007+00:00

    I guess below blogs might help:
    sqlservercentral.com/articles/merge-multiple-files-in-azure-data-factory
    https://www.youtube.com/watch?v=WbDTBAyYte8

    please tell if the requirement is something different.