download huge file from AWS S3 pre-signed url then upload to an api in chunk using Azure technology

sunny 40 Reputation points
2024-01-22T22:42:08.8333333+00:00

I need to download a huge file(ex:zip) from an AWS S3 pre-signed url then upload it to an API in chunks with base64 encoded strings using Azure technology. The destination API required Content-Type application/x-www-form-urlencoded Q1. What's the best solution to implement this requirement? Durable function or Azure data factory or other?

Q2: I made a POC in durable function but encountered some issues. Details as below. I used an HTTP client to download the file then read it into the buffer and post it to the destination API. The problem was it ran a long time for just a 2GB file. When I checked the "read" in debug, I saw it only read 10k around data into the buffer but I want to read 5MB into the buffer. I guessed this could be there was only 10k data ready when I read it as I used HttpCompletionOption.ResponseHeadersRead to save the memory. Is there any way to improve the performance of this solution or do I need to consider other solutions to achieve it?

HttpResponseMessage responsedownload = await client.GetAsync(request.fileUrl,HttpCompletionOption.ResponseHeadersRead); if (responsedownload.IsSuccessStatusCode) { 
using (Stream contentstream = await responsedownload.Content.ReadAsStreamAsync())
{
    byte[] buffer = new byte[5000000];
    string uploadresult = "";
    int read;
    string uploadfilename = request.filename;
    string isNewfile = "true";
    XmlDocument xmlDoc = new();

    while ((read = contentstream.Read(buffer, 0, buffer.Length)) > 0)
    {
        string chunkstring = string.Empty;
        using (MemoryStream ms = new MemoryStream())
        {
            ms.Write(buffer, 0, read);
            byte[] uploadchunck = ms.ToArray();
            chunkstring = (Convert.ToBase64String(uploadchunck));
        }

        Dictionary<string, string> dic = new Dictionary<string, string>
            {
                { "sessionKey", request.sessionkey },
                { "storeFileName", uploadfilename },
                { "base64Buffer", chunkstring },
                { "isNewFile", isNewfile }
            };

        var form = new FormUrlEncodedContent(dic);

        //call uploda api
        HttpResponseMessage responseupload = await client.PostAsync(request.uploadUrl, form);
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,936 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,651 questions
{count} votes

Accepted answer
  1. MikeUrnun 9,777 Reputation points Moderator
    2024-03-01T01:13:58.9033333+00:00

    @sunny I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this!

    Issue:

    You had a working code that downloads a big file (exceeding 2GB) from an AWS S3 pre-signed URL and then uploads it to an API in chunks with base64 encoded strings using Azure technology. The destination API required a Content-Type: application/x-www-form-urlencoded header. However, the entire process was slow and you observed that the buffer size was maxed at only 10kb but the desired buffer size was 5MB.

    Solution:

    You improved the performance by saving the downloaded content to a file stream and then reading them in chunks when uploading to the third-party API. Below is the key code part of this solution:

    HttpResponseMessage responsedownload = await client.GetAsync(request.fileUrl, HttpCompletionOption.ResponseHeadersRead); 
    if (responsedownload.IsSuccessStatusCode) 
    { 
    string uploadresult = ""; 
    string filepath = @"D:\home" + request.filename;            
    //Save to file stream then upload
                using (Stream contentstream = await responsedownload.Content.ReadAsStreamAsync())
                {
                    using (var fs = new FileStream(filepath, FileMode.OpenOrCreate))
                    {
                        contentstream.CopyTo(fs);
                        fs.Position = 0;
                        byte[] buffer = new byte[5000000];
                        int read;
                        string uploadfilename = request.filename;
                        string isNewfile = "true";
                        XmlDocument xmlDoc = new();
                        while ((read = fs.Read(buffer, 0, buffer.Length)) > 0)
                        {
                            string chunkstring = string.Empty;
                            using (MemoryStream ms = new MemoryStream())
                            {
                                ms.Write(buffer, 0, read);
                                byte[] uploadchunck = ms.ToArray();
                                chunkstring = (Convert.ToBase64String(uploadchunck));
                            }
                            //call upload api to upload above chunkstring
                            if (responseupload.IsSuccessStatusCode)
                            {
                            //process uploadresult
                            }
                            else
                            {
                                //throw error
                            }
                            uploadfilename = uploadresult;
                            isNewfile = "false";
                        }
                    }
                    //delet staging file
                    File.Delete(filepath);
                }
                return uploadresult;
            }
            else
                //throw error
        }
    
    
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. MikeUrnun 9,777 Reputation points Moderator
    2024-01-30T04:25:29.84+00:00

    Hello @sunny - Per my comment above, I wanted to confirm that using a multipart request would be recommended for your use case. I have answered your questions below:

    The destination API required Content-Type application/x-www-form-urlencoded

    Using Content-Type: application/x-www-form-urlencoded for uploading large files will potentially triple the payload size, see the following thread: application/x-www-form-urlencoded or multipart/form-data

    Q1. What's the best solution to implement this requirement? Durable function or Azure data factory or other?

    Yes, Durable Functions is (by design) built for long-running, stateful workflows like the one you described.

    Q2: I made a POC in durable function but encountered some issues. Details as below. I used an HTTP client to download the file then read it into the buffer and post it to the destination API.

    Instead of attempting to construct the request against the AWS S3 pre-signed URL, you'll want to leverage AWS .NET SDK and its TransferUtility() class as shown here: Uploading an object using multipart upload

    The problem was it ran a long time for just a 2GB file. When I checked the "read" in debug, I saw it only read 10k around data into the buffer but I want to read 5MB into the buffer. I guessed this could be there was only 10k data ready when I read it as I used HttpCompletionOption.ResponseHeadersRead to save the memory. Is there any way to improve the performance of this solution or do I need to consider other solutions to achieve it?

    Switching to a multipart request should resolve this issue. I hope these suggestions help, if you have any follow-up questions, feel free to comment below.


    Please "Accept Answer" if the answer is helpful so that others in the community may benefit from your experience.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.