One CSV to multiple Parquet files

Ryan Abbey 1,171 Reputation points

We have one large CSV file that we are looking to transfer in to Parquet and based on the recommended standard of up to 1GB parquet files, splitting across a few files however running in to a few issues

  1. If we don't specify a file within the parquet definition and specify e.g. 10,000,000 rows per file, what we find is the copy activity is autogenerating a subfolder based on the input file name which we don't want.
  2. If we extend 1 to specify a "File name prefix", we get error FileNamePrefixNotSupportFileBasedSource (I note the info box does say you can't specify a prefix with file based sources)

So how do we stop it generating a subfolder based on the source file name? It seems pretty restrictive and illogical to force an unwanted subfolder (a MS trait that hasn't stopped through the years!)

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,444 questions
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 37,656 Reputation points Microsoft Employee

    Hi @Ryan Abbey ,

    Please check detailed example, Which Copies file to folder(folder name will be dynamically created as you requested above(iri_FCT_yyyyMMdd))
    Step1: Create a variable in your pipeline to hold current date. Use set variable activity to set value in it.

    Step2: Use Copy activity to copy zip file. Source and Sink dataset types should be binary. In sink data set we should create a parameter which will dynamically give us target folder name as "iri_FCT_yyyyMMdd"

    Hop this will help.


    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.
    1 person found this answer helpful.