Copy parquet files between storages without Data Flow

CassPR 0 Reputation points Microsoft Employee
2023-05-05T22:44:05.53+00:00

We have parquet files in a Data Lake Storage Gen 1 with following structure:

folder "yyyyMMdd"

|-> file_yyyy _MM _dd _[n].parquet

We want to copy the files preserving the structure in a Data Lake Storage Gen 2.

With a Copy Activity we were only able to copy by merging the files into one.

Using Get Metadata to obtain child items was not possible as we got the error for folder "yyyyMMdd" not being valid for iteration.

With a Data Flow we can copy the data with a different partitioning set at the sink side (so we have different parquet files from source).

We'd like to see if there is another way that does not involve Data Flow to copy the files between storages.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,394 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,880 questions
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,432 Reputation points Microsoft Employee
    2023-05-09T00:19:02.4133333+00:00

    Hi @CassPR

    Welcome to Microsoft Q&A forum and thanks for reaching out here.

    As per my understanding, you would like to copy files from ADLS Gen1 to ADLS Gen2 preserving the folder/file hierarchy using a copy activity. Please correct me if I'm wrong.

    If that is the case, were you not able to use Preserve hierarchy copy behavior feature under sink settings?

    User's image

    Here is the document related to it: Examples of behavior of the copy operation

    User's image

    Thanks

    0 comments No comments