Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipeline Training Q

Christina Tedesco 45 Reputation points
2024-06-20T12:36:17.0733333+00:00

Hi - I'm trying to get through the

Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipeline Training module and the first exercise has us connecting to an Amazon service to access a moviesDB.csv file. Are there any steps available to set up access to the AmazonS3 service with credential to the file? I don't have an access key or secret I'm unfamiliar how to do this.

Thank you

https://learn.microsoft.com/en-us/training/modules/petabyte-scale-ingestion-azure-data-factory/4-use-copy-activity

This question is related to the following Learning Module

Azure Training
Azure Training
Azure: A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.Training: Instruction to develop new skills.
1,726 questions
0 comments No comments
{count} votes

Accepted answer
  1. AmaranS 6,845 Reputation points Microsoft Vendor
    2024-06-21T01:09:11.4333333+00:00

    Hi Christina Tedesco,

    Thank you for reaching out to us on the Microsoft Q&A forum.

    To set up access to an Amazon S3 service for the "Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipeline Training" module, you'll need to configure your S3 credentials properly.  Here’s a step-by-step guide to help you get the necessary credentials and configure them in Azure Data Factory:

    Steps to Obtain Amazon S3 Access Key and Secret Key

    1.Please sign in to the AWS Management Console using your AWS account credentials. Once logged in, proceed to obtain your Amazon S3 Access Key and Secret Key.

    2.Please navigate to the Identity and Access Management console by typing IAM in the search bar at the top and selecting IAM.

    3.Please click on Users in in the left sidebar and then click the Add user button. Enter a username (e.g., azure-data-factory-user), select Programmatic access for Access type, and Click Next: Permissions.

    4.Please choose to Attach existing policies directly, search for and select the AmazonS3FullAccess policy.  Click Next: Tags then Next: Review, and finally click Create user.

    5.After creating the user, please download the csv file with the Access key ID and Secret access key. Ensure you save this file securely.

    Steps to Configure Amazon S3 Linked Service in Azure Data Factory

    1.Please go to the Azure portal, navigate to your Data Factory instance, and click on Author & Monitor to open the Data Factory UI.

    2.Please, in the Data Factory UI, click the Create pipeline button to open the authoring canvas. In the left pane, click the + button and select Linked Service.

    3.Please type Amazon S3 in the search box of the New Linked Service pane and select it. Then, click Continue.

    4.Please provide a name for your linked service (e.g., AmazonS31). Enter the Access key ID and Secret access key from the downloaded AWS credentials file. Click Test Connection to verify, then click Create once successful.

    Steps to Configure the Dataset

    1.To create a new dataset, go to the Data Factory UI's Datasets section and click on + New. In the New Dataset pane, search for Amazon S3, choose the desired data format like DelimitedText for CSV files, and proceed by clicking Continue.

    2.Name your dataset (e.g., MoviesS3) and select your linked service (e.g., AmazonS31). Specify the file path (e.g., mybucket/moviesDB.csv)in your S3 bucket. Ensure to configure settings like headers and finalize by clicking Finish.

    Verify the Dataset and Copy Activity

    1.Please click Preview Data in the Source tab of your Copy Activity to verify dataset configuration and accessibility.

    2.Create and configure a dataset for the destination (e.g., Azure Data Lake Storage Gen2), debug using the Debug button, and monitor data copy progress and verification in the Output tab. This should help you set up and configure the necessary connections to access your data from Amazon S3 and use it within Azure Data Factory.

    If you continue to face issues, please let us know in the comments. We are here to help.

    If you find this information helpful, please acknowledge by clicking the "Upvote" and "Accept Answer" buttons on the post.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.