Azure Synapse Analytics: Unzip all files in some folders at once.

Kakehi Shunya (筧 隼弥) 201 Reputation points
2022-06-20T11:02:42.333+00:00

Hello, I'm looking to unzip all files in some folders at once in Azure Synapse Analytics.

The folder path that I want to unzip is as follows:
Folder-Name/year=YYYY/month=MM/day=DD/Files

Now, I do unzip files under "day" folder everyday.
But I don't know how unzip past data at once.
For instance file path: year=2021/month=/day=/files

Please let me know how unzip the files at once.

Any help would be appreciated.

Attached some images of the current pipeline and file path.
Current pipeline is to unzip files that file path is Agoop/pop_mesh50m/year=2022/month=06/day=15.

212897-image.png
212898-image.png
212943-image.png
212879-image.png
212899-image.png
212974-image.png
212880-image.png

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,540 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,186 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,245 questions
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,602 Reputation points Microsoft Employee
    2022-06-22T00:08:12.8+00:00

    Hello @Kakehi Shunya (筧 隼弥) ,

    Thanks for the question and using MS Q&A platform.

    My understanding is that you would like to unzip all the files from multiple folders and subfolders under a parent folder/directory using ADF. Please correct me if I misunderstood the requirement.

    In order to copy recursively from folder and subfolders, you will have to use wild card file path type in your source settings of your copy activity and then check the recursive options as shown in below image:

    213621-image.png

    The above settings will unzip all the zip files from the parent folder configured in your copy source settings using wild card and then copy them to desired destination by preserving the folder structure.

    If case if you would like to copy all the files to a single target folder instead of preserving the source folder structure, then in Sink settings you can use flattenHierarchy option as copyBehavior, but doing so will not preserve the actual source file name but instead ADF will created autogenerated name for the unziped copied files.

    To explore more about copyBehavior please refer to this document: recursive and copyBehavior examples

    213622-image.png

    Sink Settings for copy behavior:

    213470-image.png

    By using this approach, you can avoid GetMetadata activity and ForEach activity and also the processing will be fast. And for any reason, if you have to use GetMetadata activity and loop through the list of files from folders and sub-folders under a parent directory, then you will have to implement a two-level pipeline (parent and a child pipeline using execute pipeline activity) as discussed in this thread: How to read files from sub folders in Azure data factory.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    2 people found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.