question

PaulHernandez-8067 avatar image
1 Vote"
PaulHernandez-8067 asked GreenCathy-8719 commented

Copy data activity in Synapse - ADLS2 to dedicated SQL Pool - Wildcard problem

Hi everyone,

I am using the copy data activity in a Synapse pipeline to load data from azure data lake v2 to a SQL dedicated pool.

I'm facing a problem to read several parquet files from a directory in the data lake.

The folder has a simple layout:
89656-image.png


I want to use the copy activity and the "copy command".

I created a dataset pointing to the folder above:
89614-image.png

This is the configuration of the copy data --> source:

89623-image.png

and the Sink:

89570-image.png

Then I tried to validate the pipeline and got the error:
89673-image.png


Next try was using the Wildcard file path:

89637-image.png

Next validation error:

89593-image.png

Next step was to use the wildcard in the dataset configuration:
89657-image.png

I used the following wildcards with the same result:
- *
- *.*
- *.parquet

Error Messages:

Operation on target CD_EDL_STG_DWH failed: ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Operation returned an invalid status code 'NotFound'. Account: 'edlcorepiprodadls01'. FileSystem: 'data'. Path: 'data/business/development/staging/trs/pmm/V_TRUSTEDSTR_PMM_DEAL_DATA/^*.parquet'.

It is curious because it adds a "^" to the file name pattern and is not finding the files.

My final try was using a single file:
89665-image.png

This worked, but this is not covering my requirements.

Does anyone have an idea what I am doing wrong?


azure-synapse-analytics
image.png (30.7 KiB)
image.png (30.2 KiB)
image.png (41.0 KiB)
image.png (27.4 KiB)
image.png (10.1 KiB)
image.png (27.2 KiB)
image.png (8.3 KiB)
image.png (20.0 KiB)
image.png (22.0 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

PaulHernandez-8067 avatar image
2 Votes"
PaulHernandez-8067 answered GreenCathy-8719 commented

Hi @ZilvinasKundrotas-3490 ,

Hi,

I think the problem is how the UI build the configuration to be sent plus the validation problems.

Anyhow and thanks to @PRADEEPCHEEKATLA-MSFT I found a combination that works:

In the dataset, define everything including the wildcard for the file name:

91139-image.png

Then define the wildcard for the file name of the copy activity but leave the folder path empty:

91174-image.png

That "hacks" the interface and produces the following input:

 "source": {
         "type": "ParquetSource",
         "storeSettings": {
             "type": "AzureBlobFSReadSettings",
             "recursive": true,
             "wildcardFileName": "*",
             "enablePartitionDiscovery": false
         }
     }

I guess it will be fixed in next releases.


image.png (33.8 KiB)
image.png (26.1 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @PaulHernandez-8067,

Glad to know that your issue has resolved. You can accept it as an answer. And thanks for sharing the solution, which might be beneficial to other community members reading this thread.

1 Vote 1 ·
GreenCathy-8719 avatar image GreenCathy-8719 PRADEEPCHEEKATLA-MSFT ·

Wish I'd known this 2 days ago. It works perfectly. Thanks!

0 Votes 0 ·
PRADEEPCHEEKATLA-MSFT avatar image
1 Vote"
PRADEEPCHEEKATLA-MSFT answered PRADEEPCHEEKATLA-MSFT commented

Hello @PaulHernandez-8067,

Unfortunately, copying data to Azure Synapse Analytics using copy command is not supported when copy source "WildCard folder path" is set.

Note: Wildcard is now supported on both the folder path and file name for all file-based connectors (Azure Blob, ADLS Gen1, ADLS Gen2, AWS S3, File System, FTP, SFTP and HDFS).

I'm able to use the wildcard Filename as * as shown below:

89807-synapse-wildcard.png

And successfully run the pipeline without any issue.

89851-synapse-wildcard.gif

In case this doesn't works for you, should keep the wildcard folder path here blank, and input data/business/development/staging/trs/pmm/V_TRUSTEDSTR_PMM_DEAL_DATA/ in the "Directory" of the dataset.

Hope this helps. Do let us know if you any further queries.


Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.


· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @PRADEEPCHEEKATLA-MSFT ,

are you using the copy command?

Could you please also share your Sink settings?

BR
Paul

0 Votes 0 ·

Hello @PaulHernandez-8067,

I had used my sink as Azure Blob Storage.

Wildcard file filters are supported for the following connectors.

89952-image.png

For more information, see the Data Factory supports wildcard file filters for Copy Activity.

Hope this helps.


0 Votes 0 ·
image.png (21.8 KiB)

Hi @PRADEEPCHEEKATLA-MSFT ,

unfortunately my combination of source and sink is not working with wildcards.

According to the documentation if I want to use a wildcard for the file name I need to use the activity source settings:

89859-image.png

That means, it should be specified in the copy activity:

90001-image.png

When I try to validate this pipeline I got this error:

89993-image.png

So for moving data from azure data lake to azure synapse it seems like wildcard is not supported, which makes the tool still not usable for many use cases.


0 Votes 0 ·
image.png (28.4 KiB)
image.png (46.5 KiB)
image.png (12.2 KiB)
Show more comments
ZilvinasKundrotas-3490 avatar image
0 Votes"
ZilvinasKundrotas-3490 answered

I have exactly the same problem
I also have directory with several parquet files (result of transformation from DataBricks) and want to copy data to Synapse Analytics using Azure DataFactory (Copy activity)
Parquet Dataset settings: "File" field is empty, button "preview data" works correctly in both places: in DParquet data set and in "Copy data" activity source tab
Copy data activity:
- if I choose "File path type" as "File path in dataset", validation fails "Dataset Staging_DimAttributes_parquet location is a folder, the wildcard file name is required for Copy data1
- if I choose "File path type" as "Wildcard file path" (without any wildcards in folder or file filelds) and click "Validate", I get two errors:
- - Dataset Staging_DimAttributes_parquet location is a folder, the wildcard file name is required for Copy data1
- - Copying data to Azure Synapse Analytics using Copy command is not supported when copy source "WildCard folder path" is set.
- if I write wildcard (, .parquet) to file name filed, validation gives an error:
- - Copying data to Azure Synapse Analytics using Copy command is not supported when copy source "WildCard folder path" is set.
It seems that there is no way to copy parquet data to Synapse analytiks using ADF copy acivity COPY command


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.