Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipeline Training Q

Question

Hi - I'm trying to get through the

Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipeline Training module and the first exercise has us connecting to an Amazon service to access a moviesDB.csv file. Are there any steps available to set up access to the AmazonS3 service with credential to the file? I don't have an access key or secret I'm unfamiliar how to do this.

Thank you

https://learn.microsoft.com/en-us/training/modules/petabyte-scale-ingestion-azure-data-factory/4-use-copy-activity

This question is related to the following Learning Module

Accepted Answer

Hi Christina Tedesco,

Thank you for reaching out to us on the Microsoft Q&A forum.

To set up access to an Amazon S3 service for the "Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipeline Training" module, you'll need to configure your S3 credentials properly. Here’s a step-by-step guide to help you get the necessary credentials and configure them in Azure Data Factory:

Steps to Obtain Amazon S3 Access Key and Secret Key

1.Please sign in to the AWS Management Console using your AWS account credentials. Once logged in, proceed to obtain your Amazon S3 Access Key and Secret Key.

2.Please navigate to the Identity and Access Management console by typing IAM in the search bar at the top and selecting IAM.

3.Please click on Users in in the left sidebar and then click the Add user button. Enter a username (e.g., azure-data-factory-user), select Programmatic access for Access type, and Click Next: Permissions.

4.Please choose to Attach existing policies directly, search for and select the AmazonS3FullAccess policy. Click Next: Tags then Next: Review, and finally click Create user.

5.After creating the user, please download the csv file with the Access key ID and Secret access key. Ensure you save this file securely.

Steps to Configure Amazon S3 Linked Service in Azure Data Factory

1.Please go to the Azure portal, navigate to your Data Factory instance, and click on Author & Monitor to open the Data Factory UI.

2.Please, in the Data Factory UI, click the Create pipeline button to open the authoring canvas. In the left pane, click the + button and select Linked Service.

3.Please type Amazon S3 in the search box of the New Linked Service pane and select it. Then, click Continue.

4.Please provide a name for your linked service (e.g., AmazonS31). Enter the Access key ID and Secret access key from the downloaded AWS credentials file. Click Test Connection to verify, then click Create once successful.

Steps to Configure the Dataset

1.To create a new dataset, go to the Data Factory UI's Datasets section and click on + New. In the New Dataset pane, search for Amazon S3, choose the desired data format like DelimitedText for CSV files, and proceed by clicking Continue.

2.Name your dataset (e.g., MoviesS3) and select your linked service (e.g., AmazonS31). Specify the file path (e.g., mybucket/moviesDB.csv)in your S3 bucket. Ensure to configure settings like headers and finalize by clicking Finish.

Verify the Dataset and Copy Activity

1.Please click Preview Data in the Source tab of your Copy Activity to verify dataset configuration and accessibility.

2.Create and configure a dataset for the destination (e.g., Azure Data Lake Storage Gen2), debug using the Debug button, and monitor data copy progress and verification in the Output tab. This should help you set up and configure the necessary connections to access your data from Amazon S3 and use it within Azure Data Factory.

If you continue to face issues, please let us know in the comments. We are here to help.

If you find this information helpful, please acknowledge by clicking the "Upvote" and "Accept Answer" buttons on the post.

Share via

Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipeline Training Q

0 additional answers

Your answer