Intermittent error with $$FILEPATH reserved variable in copy activity

Zac 0 Reputation points
2024-03-05T07:21:00.2466667+00:00

Hi

I have a pipeline that copies parquet files from an S3 bucket to an Azure SQL Database.

The Pipeline is parametrized to pass a folder. Another pipeline calls this pipeline 29 times via the execute pipeline task to run the load of 29 folders in parallel. Note. each folder may contain multiple files.

This is run daily via a trigger and every couple of weeks, the pipeline will fail for 2 of the folders (usually the same folders) with the following error:

ErrorCode=ParquetColumnNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column DSS_RECORDSOURCE does not exist in Parquet file.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,'

DSS_RECORDSOURCE is an "additional columns" column using the value $$FILEPATH. Note. The setting recursively has been enabled.

When I rerun the load for the 2 failed files via debug it runs successfully.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 32,530 Reputation points Microsoft External Staff Moderator
    2024-03-06T07:50:16.1066667+00:00

    Hi @Zac

    Thank you for reaching out to the community forum with your query.

    Based on the error message you provided, it seems that the column "DSS_RECORDSOURCE" does not exist in the Parquet file. This error occurs intermittently for two folders, and you are using the value $$FILEPATH for the "DSS_RECORDSOURCE" column.

    This might be the reason for this error could be that the Parquet file in question does not contain the "DSS_RECORDSOURCE" column. You can verify this by checking the schema of the Parquet file using a tool like Parquet Tools. If the column is missing, you can modify the Parquet file to include the "DSS_RECORDSOURCE" column.

    Another possible reason for this error could be that the value of $$FILEPATH is not being resolved correctly for the two folders that are failing. You can try to debug this issue by checking the value of $$FILEPATH for the two folders that are failing. You can do this by adding a "Copy Data" activity to your pipeline and setting the source to the folder that is failing. Then, you can add a "Derived Column" activity to your pipeline and set the value of the "DSS_RECORDSOURCE" column to $$FILEPATH. Finally, you can add a "Sink" activity to your pipeline and set the sink to a file in Azure Blob Storage. This will allow you to check the value of $$FILEPATH for the two folders that are failing.

    I hope this helps! Let me know if you have any further questions or concerns.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.