ADF Copy activity output into multiple parquet snappy file?

ever 1 Reputation point
2020-08-17T11:34:57.857+00:00

In ADF copy activity output, pipeline is creating very large parquet file. Will it possible to split the output into multiple small parquet snappy files, so that synapse external table can use parallelism (polybase)? like Similar to DBrick data frame, where it writes into multiple small parquet files. Kindly advise how to achieve in ADF copy activity ?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,187 questions
{count} votes

2 answers

Sort by: Most helpful
  1. HarithaMaddi-MSFT 10,136 Reputation points
    2020-08-17T13:17:45.1+00:00

    Hi @ever ,

    Welcome to Microsoft Q&A Platform.

    There is no dynamic way to split the files in a copy activity in data factory. However, the same can be achieved by defining rules - specific year range or specific set of records in the table identified by a column value. These rules can be maintained in a config table/file and Lookup activity can be used to retrieve them prior to sending this information to a foreach activity that has copy activity. The query parameterization at source can be used to filter the data from single SQL table according to config table rules and loads into multiple files parallelly.

    As per our recent engagement with Azure data factory product team, they mentioned that this requirement aligns perfectly with their ongoing work item - "New property maxRowsPerFile to split and write to multiple smaller files". This will take 4-6months to be available for use.

    Hope this helps! Please let us know if our understanding is incorrect or for further queries and we will be glad to assist.

    Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

    1 person found this answer helpful.

  2. HarithaMaddi-MSFT 10,136 Reputation points
    2020-09-16T08:59:47.17+00:00

    Hi @ever ,

    Want to share the new update from ADF team which we were discussing in August. Please check this link that gives more details on this property and kindly let us know if it helps in implementing your requirement.

    Thanks for your patience!

    1 person found this answer helpful.
    0 comments No comments