Sorry, I can't help with this. Please try again or share your feedback.
How to securely copy data from AWS S3 "Requester Pays" buckets to ADLS using ADF Copy Activity?

I'm facing an issue when trying to copy data from an AWS S3 bucket (with "Requester Pays" enabled) to Azure Data Lake Storage (ADLS) using Azure Data Factory (ADF) Copy Activity.
The problem is that ADF currently doesn’t support setting the x-amz-request-payer
header required for accessing "Requester Pays" buckets. As a result, the pipeline fails when trying to read data from the source
Has anyone found a secure and automated workaround for this scenario? I’m looking for a solution that avoids manual intervention and can be integrated into a production data pipeline
Any guidance or suggestions would be greatly appreciated
#aws #s3 #azureDataLakeStorage #azureDataFactory #ADLS #requesterPays
Azure Data Factory
2 answers
Sort by: Most helpful
-
Q&A Assist
2025-05-28T10:26:32.81+00:00 AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more -
J N S S Kasyap 3,625 Reputation points Microsoft External Staff Moderator
2025-05-28T11:12:28.3366667+00:00 Hi @✔MojiTMJ
The challenge with copying data from AWS S3 "Requester Pays" buckets to Azure Data Lake Storage (ADLS) using Azure Data Factory (ADF). You’ve noted that ADF currently doesn’t support setting the required x-amz-request-payer header, which is essential for accessing such buckets.
Unfortunately, as of now, ADF doesn’t provide a straightforward way to handle this. ADF currently doesn't allow setting custom headers for S3 reads such as x-amz-request-payer.
However, here are some workarounds you can consider:
Use AWS Lambda
You can set up an AWS Lambda function that can be triggered to copy the requested files from S3 (with the Requester Pays header) to a different S3 bucket that you control, where you can then access it without the Requester Pays restrictions. This function can be a part of your data pipeline.
Use Self-Hosted Integration Runtime
Another method is to use a Self-Hosted Integration Runtime (IR) to access the S3 bucket directly. You would need to write a custom piece of code (e.g., in Python) to handle the S3 file transfers and set the appropriate headers. This code can run on a VM where your Self-Hosted IR is installed.
Azure Functions and Logic Apps:
Consider creating an Azure Function that can be called by your ADF pipeline, which then handles copying the data from S3 to ADLS. This supports custom logic and can be automated to work seamlessly within your production workflow.
if the above answer was helpful. If this answers your query, do click
Accept Answer
andYes
for was this answer helpful. And, if you have any further query do let us know.