How to securely copy data from AWS S3 "Requester Pays" buckets to ADLS using ADF Copy Activity?

Question

How to securely copy data from AWS S3 "Requester Pays" buckets to ADLS using ADF Copy Activity?

✔MojiTMJ 690

I'm facing an issue when trying to copy data from an AWS S3 bucket (with "Requester Pays" enabled) to Azure Data Lake Storage (ADLS) using Azure Data Factory (ADF) Copy Activity.

The problem is that ADF currently doesn’t support setting the x-amz-request-payer header required for accessing "Requester Pays" buckets. As a result, the pipeline fails when trying to read data from the source

Has anyone found a secure and automated workaround for this scenario? I’m looking for a solution that avoids manual intervention and can be integrated into a production data pipeline

Any guidance or suggestions would be greatly appreciated

#aws #s3 #azureDataLakeStorage #azureDataFactory #ADLS #requesterPays

2 answers

Your answer

Answer 1

Q&A Assist

AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

Sorry, I can't help with this. Please try again or share your feedback.

Answer 2

J N S S Kasyap 3,625 Microsoft External Staff Moderator

Hi @✔MojiTMJ

The challenge with copying data from AWS S3 "Requester Pays" buckets to Azure Data Lake Storage (ADLS) using Azure Data Factory (ADF). You’ve noted that ADF currently doesn’t support setting the required x-amz-request-payer header, which is essential for accessing such buckets.

Unfortunately, as of now, ADF doesn’t provide a straightforward way to handle this. ADF currently doesn't allow setting custom headers for S3 reads such as x-amz-request-payer.

However, here are some workarounds you can consider:

Use AWS Lambda

You can set up an AWS Lambda function that can be triggered to copy the requested files from S3 (with the Requester Pays header) to a different S3 bucket that you control, where you can then access it without the Requester Pays restrictions. This function can be a part of your data pipeline.

Use Self-Hosted Integration Runtime

Another method is to use a Self-Hosted Integration Runtime (IR) to access the S3 bucket directly. You would need to write a custom piece of code (e.g., in Python) to handle the S3 file transfers and set the appropriate headers. This code can run on a VM where your Self-Hosted IR is installed.

Azure Functions and Logic Apps:

Consider creating an Azure Function that can be called by your ADF pipeline, which then handles copying the data from S3 to ADLS. This supports custom logic and can be automated to work seamlessly within your production workflow.

if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

J N S S Kasyap 3,625 Reputation points Microsoft External Staff Moderator

2025-05-29T10:29:00.0633333+00:00
@✔MojiTMJ
I understand that the previous response may not have fully addressed your specific needs or expectations.

To help us provide a more accurate and production-ready solution, could you please clarify a few additional details? Specifically:

Is this a one-time migration or a recurring data transfer process?

Are you already using a Self-Hosted Integration Runtime in your ADF environment?

Do you have access to AWS IAM credentials with permission to read from the "Requester Pays" bucket?

What is the average size and number of files you need to copy from the S3 bucket?

Do you require this process to be fully automated within ADF, or are hybrid solutions like Lambda or Azure Functions acceptable?
✔MojiTMJ 690 Reputation points

2025-05-29T13:20:10.77+00:00

Thank you for the response, However, I’d like to clarify a few things upfront:

This isn’t a theoretical or one time scenario; It’s part of a robust, secure, recurring data pipeline built using Azure Data Factory, as indicated by the tag! I’ve already explored AI generated suggestions and generic workarounds across various LLMs, but that’s not what I’m looking for here
I’m specifically asking for practical solutions that have been implemented successfully in production environments

To answer the follow up questions:

This is not a one time migration, It’s a recurring, scheduled data flow and needs to be fully automated

We’re using a private Azure Integration Runtime, not self-hosted, and introducing new components like custom VMs, Lambdas, or Functions is not an option due to enterprise governance constraints

I have the required AWS IAM credentials and access to the S3 bucket

As mentioned, I’m looking for a way to handle “Requester Pays” S3 access natively or through supported ADF mechanisms not workarounds that involve building and managing additional services

I’m aware adding new resources (e.g., AWS Lambda, Azure Functions) is often suggested, but in large enterprise environments, resource provisioning is not as simple as it sounds So, I need to work within the current toolset and constraints

As part of trying to work around the issue, I’m currently exploring whether Web Activity in ADF could help insert the required x-amz-request-payer header indirectly but this is still under investigation, and any real world feedback on that would be greatly appreciated

To reiterate: I’m looking for hands-on guidance from someone who’s dealt with this exact challenge and found a reliable, repeatable way to make it work without needing to add unsupported components

Thanks again; I’d really appreciate any insights based on actual experience
J N S S Kasyap 3,625 Reputation points Microsoft External Staff Moderator

2025-05-30T07:45:20.6933333+00:00

@✔MojiTMJ
We handled this by replicating the requester-pays bucket into an internal S3 bucket (owned within our org) using S3 replication.The internal bucket was configured without requester-pays, so ADF could consume it using the standard S3 linked service without needing to pass the special header.This approach gave us full automation, stayed within the native ADF toolset, and avoided introducing any new services or infrastructure exactly the kind of alignment enterprise IT needs.

I understand that replication sounds like an added step, but if you're dealing with recurring data loads and the same set of S3 prefixes or objects, it's a one-time configuration that pays off. We set up lifecycle policies to clean up the replicated bucket regularly to keep costs under control. Once in place, our ADF pipelines ran cleanly and met SLAs without needing custom code or header hacks.

To directly address your investigation into using Web Activity: unfortunately, ADF Web Activities can call APIs but can't stream files or inject headers into the native Copy Activity afterward so while it's creative thinking, it won’t help in this case unless you already have an internal API proxying the S3 access (which I assume isn’t an option based on your constraints).

As of now, there’s no native ADF support for requester-pays headers. The cleanest enterprise-safe workaround is internal S3 replication, which avoids external services and stays fully within ADF's capabilities.

Happy to elaborate more if you need help we are happy to help
✔MojiTMJ 690 Reputation points

2025-05-30T07:53:46.5666667+00:00

Thanks for the detailed response,

in our case, the S3 bucket is not owned internally, it belongs to an external data provider; As such, we don’t have the ability to configure S3 replication on their side or set up cross account replication ourselves, the internal replication workaround, while solid for environments with control over both S3 buckets, unfortunately isn’t applicable here.

You’re absolutely right about the value of that approach in a fully internal setup and I agree it's the kind of enterprise aligned solution I’d adopt if I had the required access and control.

Still hoping there’s someone out there who’s navigated this scenario where ADF needs to access a requester-pays bucket owned by an external party using only native ADF capabilities or a creative, approved enterprise-safe workaround

Again, thanks a lot for your help and if anything new comes to mind, I’d love to hear it
J N S S Kasyap 3,625 Reputation points Microsoft External Staff Moderator

2025-06-02T15:03:06.1433333+00:00

@✔MojiTMJ
I totally understand how frustrating this limitation must be, especially when you’ve already built a stable pipeline and just need that last piece to fall into place. Given that the S3 bucket is owned by an external party and you can't configure replication, the usual go-to enterprise workarounds are out of reach. It’s unfortunate that Azure Data Factory doesn’t currently support requester-pays headers, and you’ve clearly done your due diligence ruling out all the textbook options.

That said, there is still one enterprise-safe option you can explore, and it's something that's been successfully implemented in similar high-compliance environments particularly where cross-account S3 access isn’t viable. You could ask the external data provider to generate pre-signed S3 URLs for the files you need to access. These URLs can be configured to respect the requester-pays setting on their side while allowing you to download the files directly via HTTPS.

With those signed URLs, you can then use ADF’s Binary dataset in a Copy Activity to download the files into your ADLS storage completely bypassing the need to connect to the S3 bucket directly using the native connector. This also avoids the x-amz-request-payer issue altogether since the URL itself carries the necessary access context, including billing authorization and time-bound validity.

I know this still involves coordination with the data provider, but it doesn’t require them to restructure their storage or give you direct S3 access. It's a lightweight ask and in most enterprise partnerships, especially ones built on recurring data flows, it’s something providers are generally willing to accommodate.

If the still issue persists from your side, I’d be more than happy to assist.
J N S S Kasyap 3,625 Reputation points Microsoft External Staff Moderator

2025-06-03T08:23:17.4+00:00

@✔MojiTMJ Did you get any chance implementing the solution of using pre-signed S3 URLs for accessing the files, bypassing direct S3 access and resolving the requester-pays issue? Please provided the feedback whether solution helpful Thanks.
✔MojiTMJ 690 Reputation points

2025-06-03T11:06:50.17+00:00

Thanks for the suggestion, I’m currently investigating this option, including the use of pre-signed S3 URLs with ADF’s Binary dataset; I’ll follow up once I’ve tested it in our environment

Share via

How to securely copy data from AWS S3 "Requester Pays" buckets to ADLS using ADF Copy Activity?

2 answers

Your answer