DP-203 Lab 2 question

Ben Cohn 5 Reputation points
2024-07-14T14:57:39.9033333+00:00

I'm working on DP-203 lab 2 and I got to here:

You'll need an Azure subscription in which you have administrative-level access.

Provision an Azure Synapse Analytics workspace

You'll need an Azure Synapse Analytics workspace with access to data lake storage and an Apache Spark pool that you can use to query and process files in the data lake.

In this exercise, you'll use a combination of a PowerShell script and an ARM template to provision an Azure Synapse Analytics workspace.

Sign into the Azure portal at https://portal.azure.com.

Use the [>_] button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a PowerShell environment and creating storage if prompted. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal, as shown here:

Azure portal with a cloud shell pane

    Note: If you have previously created a cloud shell that uses a Bash environment, use the the drop-down menu at the top left of the cloud shell pane to change it to PowerShell.

Note that you can resize the cloud shell by dragging the separator bar at the top of the pane, or by using the —, ◻, and X icons at the top right of the pane to minimize, maximize, and close the pane. For more information about using the Azure Cloud Shell, see the Azure Cloud Shell documentation.

In the PowerShell pane, enter the following commands to clone this repo:

rm -r dp203 -f

git clone https://github.com/MicrosoftLearning/DP-203-Azure-Data-Engineer dp203
```(might be that I need to run these commands individually)

After the repo has been cloned, enter the following commands to change to the folder for this lab and run the setup.ps1 script it contains:

cd dp203/Allfiles/labs/02

./setup.ps1

If prompted, choose which subscription you want to use (this will only happen if you have access to multiple Azure subscriptions).

When prompted, enter a suitable password to be set for your Azure Synapse SQL pool.

Note: Be sure to remember this password!

Wait for the script to complete - this typically takes around 10 minutes, but in some cases may take longer. While you are waiting, review the Apache Spark in Azure Synapse Analytics article in the Azure Synapse Analytics documentation.


The script provisions an Azure Synapse Analytics workspace and an Azure Storage account to host the data lake, then uploads some data files to the data lake.

View files in the data lake

After the script has completed, in the Azure portal, go to the dp500-xxxxxxx resource group that it created, and select your Synapse workspace.

In the Overview page for your Synapse workspace, in the Open Synapse Studio card, select Open to open Synapse Studio in a new browser tab; signing in if prompted.

On the left side of Synapse Studio, use the ›› icon to expand the menu - this reveals the different pages within Synapse Studio that you'll use to manage resources and perform data analytics tasks.

On the Manage page, select the Apache Spark pools tab and note that a Spark pool with a name similar to sparkxxxxxxx has been provisioned in the workspace. Later you will use this Spark pool to load and analyze data from files in the data lake storage for the workspace.


Azure Training
Azure Training
Azure: A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.Training: Instruction to develop new skills.
1,271 questions
{count} votes

1 answer

Sort by: Most helpful
  1. pnaroju 2,380 Reputation points Microsoft Vendor
    2024-07-15T14:23:34.83+00:00

    Hi Ben Cohn,

    Thank you for reaching out to us via the Microsoft Q&A forum.

    Based on your query, it appears you are encountering challenges completing the "Exercise - Analyze data in a data lake with Spark."

    In the 'Provision an Azure Synapse Analytics workspace' section, specifically under steps 4 and 5, it is essential to execute the commands sequentially to avoid any potential issues. Once these commands have been successfully executed, you can proceed to the next section, 'Query data in files,' and complete the exercise following the instructions provided in the documentation.

    For detailed guidance on completing the "Exercise - Analyze data in a data lake with Spark," please refer to the following link: Analyze data in a data lake with Spark

    Attached below are screenshots for your reference, demonstrating successful execution of the commands as outlined in the document: sparkimage1sparkimage2sparkimage3 Should you continue to encounter difficulties, please do not hesitate to inform us in the comments section. We are committed to assisting you further.

    If you find this information helpful, kindly acknowledge by clicking the "Upvote" and "Accept Answer" buttons on the post.

    0 comments No comments