question

RamyaHarinarthini-MSFT avatar image
0 Votes"
RamyaHarinarthini-MSFT asked azure-cxp-api edited

Copy Multiple files into ADL Gen2

I have a Data Factory pipeline that currently copies files daily from a Google Storage account down to an Azure Storage Blob ADL Gen2 enabled.

Source several different files, File1, File2, File3 etc, all have a data range in the file name File1_20200101_20200102.csv.gzip and they are .csv and zipped.

I was able to connect and use a Binary source and Binary target and just grab all files that were created/modified yesterday. Also part of the target, I unzip the files so they are just .csv.

I want to make sure that I'm setting up the structures correctly in the blob storage for it to function as a DL.

BlobContainer1/RAW/GoogleSource/File1_20200101_20200102.csv.gzip

From what I'm reading, I should probably have BlobContainer1/RAW/GoogleSource/File1/{year}/{month}/{day}/File1_20200101_20200102.csv.gzip, would that be correct?

If so, is it possible to dynamically determine the folder path based on each file name that is being pulled in, OR, do I have to create a separate copy pipeline for each File that is being copied over?

[Note: As we migrate from MSDN, this question has been posted by an Azure Cloud Engineer as a frequently asked question]

MSDN Source: Copy Multiple files into ADL Gen2


azure-data-lake-storage
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ChiragMishra-MSFT avatar image
0 Votes"
ChiragMishra-MSFT answered ChiragMishra-MSFT edited

Welcome to the Microsoft Q&A (Preview) platform.

Happy to answer your query.

It sounds like you want to have your data partitioned similarly to how the Hadoop or Synapse saves the data. To do this, I recommend you use Mapping Data Flows, as this has partitioning options used by distributed computing.

MSDN Source : Copy Multiple files into ADL Gen2


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

UtkarshSharma-7440 avatar image
0 Votes"
UtkarshSharma-7440 answered UtkarshSharma-7440 edited

Hi,

As you have ADLS Gen2 enabled, I would recommend you to use Azure Data Factory to create folders in your Storage account. You can use copy activity and extract the "Year", "Month" and "Day" part from your source file and create a hierarchy at the destination i.e. ADLS Gen2.

You may also refer below to create partitions:

{
"name": "AzureOutput",
"properties": {
"type": "AzureBlob",
"linkedServiceName": "ADLSLinkedService",
"typeProperties": {
"folderPath": "BlobContainer1/RAW/GoogleSource/File1/yearno={Year}/monthno={Month}/dayno={Day}/hourno={Hour}/",
"partitionedBy": [
{
"name": "Year",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "yyyy"
}
},
{
"name": "Month",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "%M"
}
},
{
"name": "Day",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "%d"
}
}
]

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.