Incomplete Files from Copy Data Command in Azure Data Factory pipeline when uploading data from Snowflake

Question

Incomplete Files from Copy Data Command in Azure Data Factory pipeline when uploading data from Snowflake

Susan Rakers 20

I am experiencing an issue where the file-sink of the Copy Data command (SnowflakeExportCopyCommand) is producing incomplete files when uploading data from Snowflake to Azure Blob Storage in our Azure Data Factory pipeline.

Observations:

The number of rows read from Snowflake matches the number of rows written to Azure Blob Storage, as indicated in the copy details.
However, when multiple files are generated using the COPY command, the resulting Parquet files in Azure storage have incorrect sizes and row counts.
I have explicitly set the following Snowflake copy options: SINGLE=TRUE and MAX_FILE_SIZE=900000000 but the issue persists.

Has anyone encountered similar behavior, and are there any known solutions or workarounds?

Would appreciate any insights into possible causes or additional configurations that might resolve this.

phemanth 15,855 Reputation points Microsoft External Staff Moderator

2025-03-13T08:14:58.9266667+00:00
@Susan Rakers

Please check the below steps and confirm us

Schema Mapping: Ensure that the schema mapping is correctly configured. Sometimes, the schema might not be automatically mapped, leading to inconsistencies

File Size and Partitioning: Even though you've set SINGLE=TRUE and MAX_FILE_SIZE=900000000, there might still be issues with how the files are partitioned. Try adjusting the MAX_FILE_SIZE to see if it affects the output

Data Consistency: Verify that the data consistency settings in both Snowflake and Azure Data Factory are correctly configured. Inconsistent settings might lead to incomplete file transfers

Monitoring and Logs: Check the logs and monitoring details in Azure Data Factory. Look for any warnings or errors that might give more insight into why the files are incomplete

Alternative Sink Configuration: As a troubleshooting step, try configuring a different sink, such as Azure SQL Database or another storage account, to see if the issue persists

Snowflake Connector: Ensure that you are using the latest version of the Snowflake connector in Azure Data Factory. Sometimes, updates include bug fixes that might resolve your issue

I hope this information helps. Please do let us know if you have any further queries.
Susan Rakers 20 Reputation points

2025-03-13T21:00:06.0066667+00:00
phemanth, Thank you for your response. The following is outcomes of my troubleshooting.

Schema mapping: I did not have schema mapping set. The imported schema incorrectly assigned an integer type for the destination type when the source type is number(7,3), a float. With no schema mapping, all of the fields are returned as varchar. I am looking into manually setting up the schema mapping.

File size and partitioning: I have tried a couple different MAX_FILE_SIZE and get the same results.

Data Consistency: I still need to check however I do not see any sign of data consistency in the copy details or log. The number of rows read and written are the same.

Monitoring and Logs: I did not see anything with the copy details or copy logs.

Another sink configuration: Due to the Snowflake Integration to capture the data, I cannot set up a different sink configuration.

Snowflake Connector: I am using the latest Snowflake Connector.

I am going to focus on a manual schema mapping first to see if this fixes the issue.

Thank you for your advice.
Susan Rakers 20 Reputation points

2025-03-14T17:56:26.1966667+00:00

Any advice or insight will be greatly appreciated. The following is more information regarding the issue.

Schema mapping: I explicitly mapped the schema for the COPY command, but the resulting file in the storage account is still incomplete, with a smaller file size and fewer rows indicated in the details of the copy data command.

Data Consistency: Since SnowflakeExportCopyCommand requires a stage, there are no data consistency settings

Snowflake Copy Options: Single=True Max_File_Size=300000000 Overwrite=False

I appreciate any time you can give to help with this issue. Thank you!
phemanth 15,855 Reputation points Microsoft External Staff Moderator

2025-03-17T08:36:10.88+00:00

@Susan Rakers

could you please provide more details on your configuration with screenshots if possible?

Susan Rakers 20

phemanth,

   Thank you for your continuing assistance.  The following are the requested screenshots of the copy data command in the Azure data pipeline.    I have censored our organization's identifying information.  I don't think it will prevent you from seeing the configuration details.

Copy-Source

Copy Data - Source

Copy - Sink

Copy Data - Sink

Copy - Mapping

Copy Data - Mapping

Copy - Setting

Copy Data - Settings

Hope this helps. I appreciate your time. Thank you!

Susan Rakers 20 Reputation points

2025-03-17T12:58:24.66+00:00

phemanth,

Sorry, the response was mistakenly entered as an answer and not a comment.

Thank you for your continuing assistance. The following are the requested screenshots of the copy data command in the Azure data pipeline. I have censored our organization's identifying information. I don't think it will prevent you from seeing the configuration details.

Copy - Source

Copy - Sink

Copy - Mapping

Copy - Setting

Hope this helps. I appreciate your time. Thank you!

Answer accepted by question author

1 additional answer

Your answer

phemanth 15,855 Reputation points Microsoft External Staff Moderator

2025-03-13T08:14:58.9266667+00:00

@Susan Rakers

Please check the below steps and confirm us

Schema Mapping: Ensure that the schema mapping is correctly configured. Sometimes, the schema might not be automatically mapped, leading to inconsistencies

File Size and Partitioning: Even though you've set SINGLE=TRUE and MAX_FILE_SIZE=900000000, there might still be issues with how the files are partitioned. Try adjusting the MAX_FILE_SIZE to see if it affects the output

Data Consistency: Verify that the data consistency settings in both Snowflake and Azure Data Factory are correctly configured. Inconsistent settings might lead to incomplete file transfers

Monitoring and Logs: Check the logs and monitoring details in Azure Data Factory. Look for any warnings or errors that might give more insight into why the files are incomplete

Alternative Sink Configuration: As a troubleshooting step, try configuring a different sink, such as Azure SQL Database or another storage account, to see if the issue persists

Snowflake Connector: Ensure that you are using the latest version of the Snowflake connector in Azure Data Factory. Sometimes, updates include bug fixes that might resolve your issue

I hope this information helps. Please do let us know if you have any further queries.
Susan Rakers 20 Reputation points

2025-03-13T21:00:06.0066667+00:00

phemanth, Thank you for your response. The following is outcomes of my troubleshooting.

Schema mapping: I did not have schema mapping set. The imported schema incorrectly assigned an integer type for the destination type when the source type is number(7,3), a float. With no schema mapping, all of the fields are returned as varchar. I am looking into manually setting up the schema mapping.

File size and partitioning: I have tried a couple different MAX_FILE_SIZE and get the same results.

Data Consistency: I still need to check however I do not see any sign of data consistency in the copy details or log. The number of rows read and written are the same.

Monitoring and Logs: I did not see anything with the copy details or copy logs.

Another sink configuration: Due to the Snowflake Integration to capture the data, I cannot set up a different sink configuration.

Snowflake Connector: I am using the latest Snowflake Connector.

I am going to focus on a manual schema mapping first to see if this fixes the issue.

Thank you for your advice.
Susan Rakers 20 Reputation points

2025-03-14T17:56:26.1966667+00:00

Any advice or insight will be greatly appreciated. The following is more information regarding the issue.

Schema mapping: I explicitly mapped the schema for the COPY command, but the resulting file in the storage account is still incomplete, with a smaller file size and fewer rows indicated in the details of the copy data command.

Data Consistency: Since SnowflakeExportCopyCommand requires a stage, there are no data consistency settings

Snowflake Copy Options: Single=True Max_File_Size=300000000 Overwrite=False

I appreciate any time you can give to help with this issue. Thank you!
phemanth 15,855 Reputation points Microsoft External Staff Moderator

2025-03-17T08:36:10.88+00:00

@Susan Rakers

could you please provide more details on your configuration with screenshots if possible?
Susan Rakers 20 Reputation points

2025-03-17T12:29:44.26+00:00

phemanth,

Thank you for your continuing assistance. The following are the requested screenshots of the copy data command in the Azure data pipeline. I have censored our organization's identifying information. I don't think it will prevent you from seeing the configuration details.

Copy-Source

Copy - Sink

Copy - Mapping

Copy - Setting

Hope this helps. I appreciate your time. Thank you!
Susan Rakers 20 Reputation points

2025-03-17T12:58:24.66+00:00

phemanth,

Sorry, the response was mistakenly entered as an answer and not a comment.

Thank you for your continuing assistance. The following are the requested screenshots of the copy data command in the Azure data pipeline. I have censored our organization's identifying information. I don't think it will prevent you from seeing the configuration details.

Copy - Source

Copy - Sink

Copy - Mapping

Copy - Setting

Hope this helps. I appreciate your time. Thank you!

Answer 1

@Susan Rakers

I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer .

Ask:

I am experiencing an issue where the file-sink of the Copy Data command (SnowflakeExportCopyCommand) is producing incomplete files when uploading data from Snowflake to Azure Blob Storage in our Azure Data Factory pipeline.

Observations:

The number of rows read from Snowflake matches the number of rows written to Azure Blob Storage, as indicated in the copy details.
However, when multiple files are generated using the COPY command, the resulting Parquet files in Azure storage have incorrect sizes and row counts.
I have explicitly set the following Snowflake copy options: SINGLE=TRUE and MAX_FILE_SIZE=900000000 but the issue persists.

Has anyone encountered similar behavior, and are there any known solutions or workarounds?

Would appreciate any insights into possible causes or additional configurations that might resolve this.

Solution: I have found the solution to my issue with the file-sink of the Copy Data command (SnowflakeExportCopyCommand) is producing incomplete files when uploading data from Snowflake to Azure Blob Storage in our Azure Data Factory pipeline.

Set schema mapping
Snowflake Copy Options: OVERWRITE: False MAX_FILE_SIZE: 300000000 SINGLE: True
In Sink, set Copy Behavior to 'Merge Files'

The 'Merge Files' option combine the multiple obtained files of the copy into one file. The schema mapping must set set to avoid schema inconsistencies.

If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.

Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

Answer 2

Susan Rakers 20

I have found the solution to my issue with the file-sink of the Copy Data command (SnowflakeExportCopyCommand) is producing incomplete files when uploading data from Snowflake to Azure Blob Storage in our Azure Data Factory pipeline.

Set schema mapping
Snowflake Copy Options: OVERWRITE: False MAX_FILE_SIZE: 300000000 SINGLE: True
In Sink, set Copy Behavior to 'Merge Files'

The 'Merge Files' option combine the multiple obtained files of the copy into one file. The schema mapping must set set to avoid schema inconsistencies.

Thanks for all of your help in finding the solution.

phemanth 15,855 Reputation points Microsoft External Staff Moderator

2025-03-19T07:02:36.8666667+00:00

@Susan Rakers Glad to know your issue has been resolved. Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others "I'll repost your solution in case you'd like to accept the answer.

Share via

Incomplete Files from Copy Data Command in Azure Data Factory pipeline when uploading data from Snowflake

1 additional answer

Your answer