download huge file from AWS S3 pre-signed url then upload to an api in chunk using Azure technology

Question

download huge file from AWS S3 pre-signed url then upload to an api in chunk using Azure technology

sunny 40

I need to download a huge file(ex:zip) from an AWS S3 pre-signed url then upload it to an API in chunks with base64 encoded strings using Azure technology. The destination API required Content-Type application/x-www-form-urlencoded Q1. What's the best solution to implement this requirement? Durable function or Azure data factory or other?

Q2: I made a POC in durable function but encountered some issues. Details as below. I used an HTTP client to download the file then read it into the buffer and post it to the destination API. The problem was it ran a long time for just a 2GB file. When I checked the "read" in debug, I saw it only read 10k around data into the buffer but I want to read 5MB into the buffer. I guessed this could be there was only 10k data ready when I read it as I used HttpCompletionOption.ResponseHeadersRead to save the memory. Is there any way to improve the performance of this solution or do I need to consider other solutions to achieve it?

HttpResponseMessage responsedownload = await client.GetAsync(request.fileUrl,HttpCompletionOption.ResponseHeadersRead); if (responsedownload.IsSuccessStatusCode) { 
using (Stream contentstream = await responsedownload.Content.ReadAsStreamAsync())
{
    byte[] buffer = new byte[5000000];
    string uploadresult = "";
    int read;
    string uploadfilename = request.filename;
    string isNewfile = "true";
    XmlDocument xmlDoc = new();

    while ((read = contentstream.Read(buffer, 0, buffer.Length)) > 0)
    {
        string chunkstring = string.Empty;
        using (MemoryStream ms = new MemoryStream())
        {
            ms.Write(buffer, 0, read);
            byte[] uploadchunck = ms.ToArray();
            chunkstring = (Convert.ToBase64String(uploadchunck));
        }

        Dictionary<string, string> dic = new Dictionary<string, string>
            {
                { "sessionKey", request.sessionkey },
                { "storeFileName", uploadfilename },
                { "base64Buffer", chunkstring },
                { "isNewFile", isNewfile }
            };

        var form = new FormUrlEncodedContent(dic);

        //call uploda api
        HttpResponseMessage responseupload = await client.PostAsync(request.uploadUrl, form);

MikeUrnun 9,777 Reputation points Moderator

2024-01-29T05:24:00.19+00:00

Hello @sunny - Streaming chunks in a multipart request all run in Activity Function would do the trick, I think. Something like the E2_CopyFileToBlob activity function but I'll look to see if a more complete example (similar to your use case with Content-Type: application/x-www-form-urlencoded) can be shown here.

Accepted answer

1 additional answer

Your answer

MikeUrnun 9,777 Reputation points Moderator

2024-01-29T05:24:00.19+00:00

Hello @sunny - Streaming chunks in a multipart request all run in Activity Function would do the trick, I think. Something like the E2_CopyFileToBlob activity function but I'll look to see if a more complete example (similar to your use case with Content-Type: application/x-www-form-urlencoded) can be shown here.

Answer 1

@sunny I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this!

Issue:

You had a working code that downloads a big file (exceeding 2GB) from an AWS S3 pre-signed URL and then uploads it to an API in chunks with base64 encoded strings using Azure technology. The destination API required a Content-Type: application/x-www-form-urlencoded header. However, the entire process was slow and you observed that the buffer size was maxed at only 10kb but the desired buffer size was 5MB.

Solution:

You improved the performance by saving the downloaded content to a file stream and then reading them in chunks when uploading to the third-party API. Below is the key code part of this solution:

HttpResponseMessage responsedownload = await client.GetAsync(request.fileUrl, HttpCompletionOption.ResponseHeadersRead); 
if (responsedownload.IsSuccessStatusCode) 
{ 
string uploadresult = ""; 
string filepath = @"D:\home" + request.filename;            
//Save to file stream then upload
            using (Stream contentstream = await responsedownload.Content.ReadAsStreamAsync())
            {
                using (var fs = new FileStream(filepath, FileMode.OpenOrCreate))
                {
                    contentstream.CopyTo(fs);
                    fs.Position = 0;
                    byte[] buffer = new byte[5000000];
                    int read;
                    string uploadfilename = request.filename;
                    string isNewfile = "true";
                    XmlDocument xmlDoc = new();
                    while ((read = fs.Read(buffer, 0, buffer.Length)) > 0)
                    {
                        string chunkstring = string.Empty;
                        using (MemoryStream ms = new MemoryStream())
                        {
                            ms.Write(buffer, 0, read);
                            byte[] uploadchunck = ms.ToArray();
                            chunkstring = (Convert.ToBase64String(uploadchunck));
                        }
                        //call upload api to upload above chunkstring
                        if (responseupload.IsSuccessStatusCode)
                        {
                        //process uploadresult
                        }
                        else
                        {
                            //throw error
                        }
                        uploadfilename = uploadresult;
                        isNewfile = "false";
                    }
                }
                //delet staging file
                File.Delete(filepath);
            }
            return uploadresult;
        }
        else
            //throw error
    }

Answer 2

MikeUrnun 9,777 Moderator

Hello @sunny - Per my comment above, I wanted to confirm that using a multipart request would be recommended for your use case. I have answered your questions below:

The destination API required Content-Type application/x-www-form-urlencoded

Using Content-Type: application/x-www-form-urlencoded for uploading large files will potentially triple the payload size, see the following thread: application/x-www-form-urlencoded or multipart/form-data

Q1. What's the best solution to implement this requirement? Durable function or Azure data factory or other?

Yes, Durable Functions is (by design) built for long-running, stateful workflows like the one you described.

Q2: I made a POC in durable function but encountered some issues. Details as below. I used an HTTP client to download the file then read it into the buffer and post it to the destination API.

Instead of attempting to construct the request against the AWS S3 pre-signed URL, you'll want to leverage AWS .NET SDK and its TransferUtility() class as shown here: Uploading an object using multipart upload

The problem was it ran a long time for just a 2GB file. When I checked the "read" in debug, I saw it only read 10k around data into the buffer but I want to read 5MB into the buffer. I guessed this could be there was only 10k data ready when I read it as I used HttpCompletionOption.ResponseHeadersRead to save the memory. Is there any way to improve the performance of this solution or do I need to consider other solutions to achieve it?

Switching to a multipart request should resolve this issue. I hope these suggestions help, if you have any follow-up questions, feel free to comment below.

Please "Accept Answer" if the answer is helpful so that others in the community may benefit from your experience.

sunny 40 Reputation points

2024-02-29T03:32:01.4533333+00:00

Hi Mike, Thanks for your comments. Unfortunately, I cannot use AWS SDK to download the file due to a security concern about the account it required, so I have to use S3 pre-signed url to download. I did some changes for the download and upload code and the performance got much better now. Thanks for your advice!
MikeUrnun 9,777 Reputation points Moderator

2024-02-29T07:19:57.3966667+00:00

Glad to hear it, @sunny! Please consider sharing more details on how you improved the performance so that others visiting this thread may benefit from your experience. Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others", I'll repost your solution in case you'd like to "Accept" the answer.

sunny 40

The updated solution savd download content to a file stream then read chunks to upload to the third party api. Below is the key code part of this solution.

HttpResponseMessage responsedownload = await client.GetAsync(request.fileUrl, HttpCompletionOption.ResponseHeadersRead); 
if (responsedownload.IsSuccessStatusCode) 
{ 
string uploadresult = ""; 
string filepath = @"D:\home" + request.filename;            
//Save to file stream then upload
            using (Stream contentstream = await responsedownload.Content.ReadAsStreamAsync())
            {
                using (var fs = new FileStream(filepath, FileMode.OpenOrCreate))
                {
                    contentstream.CopyTo(fs);
                    fs.Position = 0;
                    byte[] buffer = new byte[5000000];
                    int read;
                    string uploadfilename = request.filename;
                    string isNewfile = "true";
                    XmlDocument xmlDoc = new();
                    while ((read = fs.Read(buffer, 0, buffer.Length)) > 0)
                    {
                        string chunkstring = string.Empty;
                        using (MemoryStream ms = new MemoryStream())
                        {
                            ms.Write(buffer, 0, read);
                            byte[] uploadchunck = ms.ToArray();
                            chunkstring = (Convert.ToBase64String(uploadchunck));
                        }
                        //call upload api to upload above chunkstring
                        if (responseupload.IsSuccessStatusCode)
                        {
                        //process uploadresult
                        }
                        else
                        {
                            //throw error
                        }
                        uploadfilename = uploadresult;
                        isNewfile = "false";
                    }
                }
                //delet staging file
                File.Delete(filepath);
            }
            return uploadresult;
        }
        else
            //throw error
    }

Share via

download huge file from AWS S3 pre-signed url then upload to an api in chunk using Azure technology

1 additional answer

Your answer