Copy data activity in Synapse - ADLS2 to dedicated SQL Pool - Wildcard problem

Paul Hernandez 631 Reputation points Microsoft Employee
2021-04-20T19:01:54.177+00:00

Hi everyone,

I am using the copy data activity in a Synapse pipeline to load data from azure data lake v2 to a SQL dedicated pool.

I'm facing a problem to read several parquet files from a directory in the data lake.

The folder has a simple layout:
89656-image.png

I want to use the copy activity and the "copy command".

I created a dataset pointing to the folder above:
89614-image.png

This is the configuration of the copy data --> source:

89623-image.png

and the Sink:

89570-image.png

Then I tried to validate the pipeline and got the error:
89673-image.png

Next try was using the Wildcard file path:

89637-image.png

Next validation error:

89593-image.png

Next step was to use the wildcard in the dataset configuration:
89657-image.png

I used the following wildcards with the same result:

  • *
  • *.*
  • *.parquet

Error Messages:

Operation on target CD_EDL_STG_DWH failed: ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Operation returned an invalid status code 'NotFound'. Account: 'edlcorepiprodadls01'. FileSystem: 'data'. Path: 'data/business/development/staging/trs/pmm/V_TRUSTEDSTR_PMM_DEAL_DATA/^*.parquet'.

It is curious because it adds a "^" to the file name pattern and is not finding the files.

My final try was using a single file:
89665-image.png

This worked, but this is not covering my requirements.

Does anyone have an idea what I am doing wrong?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,396 questions
0 comments No comments
{count} vote

Accepted answer
  1. Paul Hernandez 631 Reputation points Microsoft Employee
    2021-04-26T07:31:12.813+00:00

    Hi @Zilvinas Kundrotas ,

    Hi,

    I think the problem is how the UI build the configuration to be sent plus the validation problems.

    Anyhow and thanks to @PRADEEPCHEEKATLA-MSFT I found a combination that works:

    In the dataset, define everything including the wildcard for the file name:

    91139-image.png

    Then define the wildcard for the file name of the copy activity but leave the folder path empty:

    91174-image.png

    That "hacks" the interface and produces the following input:

    "source": {  
            "type": "ParquetSource",  
            "storeSettings": {  
                "type": "AzureBlobFSReadSettings",  
                "recursive": true,  
                "wildcardFileName": "*",  
                "enablePartitionDiscovery": false  
            }  
        }  
    

    I guess it will be fixed in next releases.

    2 people found this answer helpful.

2 additional answers

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 77,751 Reputation points Microsoft Employee
    2021-04-21T07:46:40.32+00:00

    Hello @Paul Hernandez ,

    Unfortunately, copying data to Azure Synapse Analytics using copy command is not supported when copy source "WildCard folder path" is set.

    Note: Wildcard is now supported on both the folder path and file name for all file-based connectors (Azure Blob, ADLS Gen1, ADLS Gen2, AWS S3, File System, FTP, SFTP and HDFS).

    I'm able to use the wildcard Filename as * as shown below:

    89807-synapse-wildcard.png

    And successfully run the pipeline without any issue.

    89851-synapse-wildcard.gif

    In case this doesn't works for you, should keep the wildcard folder path here blank, and input data/business/development/staging/trs/pmm/V_TRUSTEDSTR_PMM_DEAL_DATA/ in the "Directory" of the dataset.

    Hope this helps. Do let us know if you any further queries.

    ------------

    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

  2. Zilvinas Kundrotas 1 Reputation point
    2021-04-25T12:23:58.777+00:00

    I have exactly the same problem
    I also have directory with several parquet files (result of transformation from DataBricks) and want to copy data to Synapse Analytics using Azure DataFactory (Copy activity)
    Parquet Dataset settings: "File" field is empty, button "preview data" works correctly in both places: in DParquet data set and in "Copy data" activity source tab
    Copy data activity:

    • if I choose "File path type" as "File path in dataset", validation fails "Dataset Staging_DimAttributes_parquet location is a folder, the wildcard file name is required for Copy data1
    • if I choose "File path type" as "Wildcard file path" (without any wildcards in folder or file filelds) and click "Validate", I get two errors:
    • - Dataset Staging_DimAttributes_parquet location is a folder, the wildcard file name is required for Copy data1
    • - Copying data to Azure Synapse Analytics using Copy command is not supported when copy source "WildCard folder path" is set.
    • if I write wildcard (*, *.parquet) to file name filed, validation gives an error:
    • - Copying data to Azure Synapse Analytics using Copy command is not supported when copy source "WildCard folder path" is set.
      It seems that there is no way to copy parquet data to Synapse analytiks using ADF copy acivity COPY command
    0 comments No comments