Azure Data Factory Copy activity can preview CSV data source, but copies 0 rows from it

Dronec 181 Reputation points
2021-04-20T08:51:54.13+00:00

I am using Azure Data Factory Copy activity to copy data from CSV file on Google S3 storage to MySQL. I can successfully connect to the storage, list the files, locate the CSV file, extract schema from it and preview data, however when I try to execute the pipeline, it finishes successfully indicating that it read exactly 0 bytes from the source and inserted 0 rows.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

2 answers

Sort by: Most helpful
  1. Dronec 181 Reputation points
    2021-05-13T06:22:10.703+00:00

    Update: just had a meeting with the Microsoft engineers.
    Looks like the root cause is that Google storage that emulates Amazon S3 API returns file size = -1, which is interpreted as infinite by Copy activity.
    In the regular mode, when Copy activity using multiple threads, it tries to split -1 and gets 0 for each thread.
    However, if I use (as MS engineers suggested) undocumented sequential mode

                            "storeSettings": {
                                "type": "GoogleCloudStorageReadSettings",
                                "multipartSourceType": "Sequential",
                                "recursive": true,
                                "enablePartitionDiscovery": false
                            },
                            "formatSettings": {
                                "type": "DelimitedTextReadSettings"
                            }
    

    It actually copies file, but infinitely and never finishes. I've stopped it when it inserted 510k rows from a tiny 17-rows file.


  2. Dronec 181 Reputation points
    2023-09-07T19:53:26.6433333+00:00

    I implemented a workaround which worked: instead of direct copying from the bucket, I generate a link for downloading a file from it.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.