Remove '00001' suffix in file name generated by data flow

Teo 121 Reputation points
2021-08-08T01:20:29.33+00:00

I'm using Azure Data Factor data flow to save the incoming data as partitioned *.parquet files (Year/Month). I'm using the pattern setting for names of the files, as shown in the screenshot below. ADF automatically appends "00001" to the file name which I don't need because I use an expression to generate the file name, e.g. "Sales Date=2021-08-07-00001". The Optimize tab is set to Key partition type.

Is there any way to remove the '00001" suffix in the file name?

121371-image.png

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,639 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Teo 121 Reputation points
    2021-08-09T19:16:22.727+00:00

    Which is what we use. It looks like there isn't a way to ignore the Spark partitioning scheme. The suggested Filename[n] pattern implies that I can remove the "n" and thus remove -00001 being added to each file.

    1 person found this answer helpful.

  2. Teo 121 Reputation points
    2021-08-16T22:01:08.497+00:00

    How are these Spark partition files generated? I thought that they all will have '00001' suffix. But after staging a large dataset, I see that now they have different numbers. What will happen if I rerun the load? Will Spark retain the same numbers? Is there a way to control the size of the partition?

    123655-image.png


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.