Quickstart: Create an Apache Spark GPU-enabled Pool in Azure Synapse Analytics using the Azure portal

An Apache Spark pool provides open-source big data compute capabilities where data can be loaded, modeled, processed, and distributed for faster analytic insight. Synapse now offers the ability to create Apache Spark pools that use GPUs on the backend to run your Spark workloads on GPUs for accelerated processing.

In this quickstart, you learn how to use the Azure portal to create an Apache Spark GPU-enabled pool in an Azure Synapse Analytics workspace.

Warning

  • The GPU accelerated preview is limited to the Azure Synapse 3.1 (unsupported) and Apache Spark 3.2 (End of Support announced) runtimes.
  • Azure Synapse Runtime for Apache Spark 3.1 has reached its end of support as of January 26, 2023, with official support discontinued effective January 26, 2024, and no further addressing of support tickets, bug fixes, or security updates beyond this date.
  • End of support announced for Azure Synapse Runtime for Apache Spark 3.2 has been announced July 8, 2023. End of support announced runtimes will not have bug and feature fixes. Security fixes will be backported based on risk assessment. This runtime will be retired and disabled as of July 8, 2024.

Note

Azure Synapse GPU-enabled pools are currently in Public Preview.

If you don't have an Azure subscription, create a free account before you begin.

Prerequisites

Sign in to the Azure portal

Sign in to the Azure portal

  1. Navigate to the Synapse workspace where the Apache Spark pool will be created by typing the service name (or resource name directly) into the search bar. Azure portal search bar with Synapse workspaces typed in.
  2. From the list of workspaces, type the name (or part of the name) of the workspace to open. For this example, we'll use a workspace named contosoanalytics. Listing of Synapse workspaces filtered to show those containing the name Contoso.

Create new Azure Synapse GPU-enabled pool

  1. In the Synapse workspace where you want to create the Apache Spark pool, select New Apache Spark pool. Overview of Synapse workspace with a red box around the command to create a new Apache Spark pool

  2. Enter the following details in the Basics tab:

    Setting Suggested value Description 
    Apache Spark pool name A valid pool name This is the name that the Apache Spark pool will have.
    Node size family Hardware Accelerated Choose Hardware Accelerated from the drop-down menu
    Node size Large (16 vCPU / 110 GB / 1 GPU) Set this to the smallest size to reduce costs for this quickstart
    Autoscale Disabled We don't need autoscale for this quickstart
    Number of nodes 3 Use a small size to limit costs for this quickstart

    Apache Spark pool create flow - basics tab.

    Important

    Note that there are specific limitations for the names that Apache Spark pools can use. Names must contain letters or numbers only, must be 15 or less characters, must start with a letter, not contain reserved words, and be unique in the workspace.

  3. Select Next: additional settings and review the default settings. Do not modify any default settings. Note that GPU pools can only be created with Apache Spark 3.1. Screenshot that shows the "Create Apache Spark pool" page with the "Additional settings" tab selected.

  4. Select Next: tags. Don't add any tags.

    Apache Spark pool create flow - additional settings tab.

  5. Select Review + create.

  6. Make sure that the details look correct based on what was previously entered, and select Create. Apache Spark pool create flow - review settings tab.

  7. At this point, the resource provisioning flow will start, indicating once it's complete. Screenshot that shows the "Overview" page with a "Your deployment is complete" message displayed.

  8. After the provisioning completes, navigating back to the workspace will show a new entry for the newly created Azure Synapse GPU-enabled pool. Apache Spark pool create flow - resource provisioning.

  9. At this point, there are no resources running, no charges for Spark, you have created metadata about the Spark instances you want to create.

Clean up resources

Follow the steps below to delete the Apache Spark pool from the workspace.

Warning

Deleting an Apache Spark pool will remove the analytics engine from the workspace. It will no longer be possible to connect to the pool, and all queries, pipelines, and notebooks that use this Apache Spark pool will no longer work.

If you want to delete the Apache Spark pool, do the following:

  1. Navigate to the Apache Spark pools blade in the workspace.
  2. Select the Apache Spark pool to be deleted (in this case, contosospark).
  3. Press delete.

Listing of Apache Spark pools, with the recently created pool selected.

  1. Confirm the deletion, and press Delete button.

Confirmation dialog to delete the selected Apache Spark pool.

  1. When the process completes successfully, the Apache Spark pool will no longer be listed in the workspace resources.

Next steps