@Kumar, Arun - Thanks for the question and using MS Q&A platform.
To read Parquet files dynamically in a Data Flow in Azure Data Factory, you can use the "Wildcard file path" option in the source settings. This allows you to specify a pattern for the file names or folder names that you want to read.
In your case, you can use the following wildcard pattern to read all the Parquet files inside the "staging/ABC/XYZ" folder and its subfolders: staging/ABC/XYZ/*.parquet
This pattern will match all the Parquet files with a ".parquet" extension inside the "staging/ABC/XYZ" folder and its subfolders.
To use this pattern in a Data Flow source, follow these steps:
In source transformation, you can read from a container, folder, or individual file in Azure Blob Storage. Use the Source options tab to manage how the files are read.
Once you have configured the source settings, you can use the source in your Data Flow to read the Parquet files dynamically. The Data Flow will automatically detect all the Parquet files that match the wildcard pattern and read them into the Data Flow.
Here is the complete steps read the parquet files from inside the folders in Azure Blob Storage:
Step1: Created three parquet files generated in the blob folder - staging/ABC/XYZ/userdata1.parquet
Step2: Create dataset file formats parquet and select the linked service and under file path select only container name: staging
as shown:
Step3: Create data flow and add source and select linked service under source options select wildcard path as: /ABC/XYZ/*.parquet
as shown:
Step4: Click on data purview to see the data in all parquet files as shown:
Note: To add dynamic content use this: `/ABC/XYZ/*.parquet`
as shown below:
Dataflow expression builder:
For more details, refer to Wildcards in Data Flow and SO thread addressing similar issue.
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.