Service principals for Azure Databricks automation

A service principal is an identity created for use with automated tools and systems including scripts, apps, and CI/CD platforms.

As a security best practice, Databricks recommends using an Azure AD service principal and its Azure AD token instead of your Azure Databricks user or your Azure Databricks personal access token for your workspace user to give automated tools and systems access to Azure Databricks resources. Some benefits to this approach include the following:

  • You can grant and restrict access to Azure Databricks resources for an Azure AD service principal independently of a user. For instance, this allows you to prohibit an Azure AD service principal from acting as an admin in your Azure Databricks workspace while still allowing other specific users in your workspace to continue to act as admins.
  • Users can safeguard their access tokens from being accessed by automated tools and systems.
  • You can temporarily disable or permanently delete an Azure AD service principal without impacting other users. For instance, this allows you to pause or remove access from an Azure AD service principal that you suspect is being used in a malicious way.
  • If a user leaves your organization, you can remove that user without impacting any Azure AD service principal.

To create an Azure AD service principal for use with Azure Databricks, you use these tools and APIs:

  • You create an Azure AD service principal with tools such as the Azure portal or Terraform.
  • After you create an Azure AD service principal, you add it to your Azure Databricks workspace with the SCIM API 2.0 (ServicePrincipals) for workspaces. To call this API, you can also use tools such as curl or Postman, or you can use Terraform. You cannot use the Azure Databricks user interface.
  • You create an Azure AD token for an Azure AD service principal with tools such as curl or Postman. You cannot use the Azure Databricks user interface or Terraform.

This article describes how to:

  1. Create an Azure AD service principal in Azure.
  2. Add the Azure AD service principal to your Azure Databricks workspace.
  3. Create an Azure AD token for the Azure AD service principal.

Use curl or Postman

Follow these instructions to use the Azure portal to create a Azure AD service principal in Azure, use curl or Postman to add the Azure AD service principal to your Azure Databricks workspace, and then create an Azure AD token for the Azure AD service principal.

To use Terraform instead of curl or Postman, skip to Add an Azure AD service principal to an Azure Databricks workspace.

Requirements

If you want to call the Azure Databricks APIs with Postman, note that instead of entering your Azure Databricks workspace instance name, for example adb-1234567890123456.7.azuredatabricks.net and your Azure Databricks personal access token for your workspace user for every Postman example in this article, you can define variables and use variables in Postman instead.

If you want to call the Azure Databricks APIs with curl, this article’s curl examples use two environment variables, DATABRICKS_HOST and DATABRICKS_TOKEN, representing your Azure Databricks per-workspace URL, for example https://adb-1234567890123456.7.azuredatabricks.net; and your Azure Databricks personal access token for your workspace user. To set these environment variables, do the following:

Unix, linux, and macos

To set the environment variables for only the current terminal session, run the following commands. To set the environment variables for all terminal sessions, enter the following commands into your shell’s startup file and then restart your terminal. Replace the example values here with your own values.

export DATABRICKS_HOST="https://adb-12345678901234567.8.azuredatabricks.net"
export DATABRICKS_TOKEN="dapi1234567890b2cd34ef5a67bc8de90fa12b"

Windows

To set the environment variables for only the current Command Prompt session, run the following commands. Replace the example values here with your own values.

set DATABRICKS_HOST="https://adb-12345678901234567.8.azuredatabricks.net"
set DATABRICKS_TOKEN="dapi1234567890b2cd34ef5a67bc8de90fa12b"

To set the environment variables for all Command Prompt sessions, run the following commands and then restart your Command Prompt. Replace the example values here with your own values.

setx DATABRICKS_HOST "https://adb-12345678901234567.8.azuredatabricks.net"
setx DATABRICKS_TOKEN "dapi1234567890b2cd34ef5a67bc8de90fa12b"

If you want to call the Azure Databricks APIs with curl, also note the following:

  • This article’s curl examples use shell command formatting for Unix, Linux, and macOS. For the Windows Command shell, replace \ with ^, and replace ${...} with %...%.
  • You can use a tool such as jq to format the JSON-formatted output of curl for easier reading and querying. This article’s curl examples use jq to format the JSON output.
  • If you work with multiple Azure Databricks workspaces, instead of constantly changing the DATABRICKS_HOST and DATABRICKS_TOKEN variables, you can use a .netrc file. If you use a .netrc file, modify this article’s curl examples as follows:
    • Change curl -X to curl --netrc -X
    • Replace ${DATABRICKS_HOST} with your Azure Databricks per-workspace URL, for example https://adb-1234567890123456.7.azuredatabricks.net
    • Remove --header "Authorization: Bearer ${DATABRICKS_TOKEN}" \

Add an Azure AD service principal to an Azure Databricks workspace

Step 1: Create the Azure AD service principal

If you already have an Azure AD service principal available, skip ahead to Step 2.

To create an Azure AD service principal, follow the instructions in Provision a service principal in Azure portal.

After you create the Azure AD service principal, copy the following values for the Azure AD service principal, as you will need them in later steps.

  • Application (client) ID
  • Directory (tenant) ID
  • The client secret’s Value

Step 2: Add the Azure AD service principal to the Azure Databricks workspace

You can use tools such as curl and Postman to add the Azure AD service principal to your Azure Databricks workspace. In the following instructions, replace:

  • <application-client-id> with the Application (client) ID value for the Azure AD service principal.
  • <display-name> with a display name for the Azure AD service principal.
  • The entitlements array with any additional entitlements for the Azure AD service principal. This example grants the Azure AD service principal the ability to create clusters. Workspace access and Databricks SQL access is granted to the Azure AD service principal by default.
  • <group-id> with the group ID for any group in your Azure Databricks workspace that you want the Azure AD service principal to belong to. (It can be easier to set access permissions on groups instead of each Azure AD service principal individually.)
    • To add additional groups, add each group ID to the groups array.
    • To get a group ID, call Get groups.
    • To create a group, Manage groups with the user interface or call the Create group API.
    • To add access permissions to a group, see Manage groups for user interface options or call the Permissions API 2.0.
    • To not add the Azure AD service principal to any groups, remove the groups array.
Curl

Run the following command. Make sure the add-service-principal.json file is in the same directory where you run this command.

curl -X POST \
${DATABRICKS_HOST}/api/2.0/preview/scim/v2/ServicePrincipals \
--header "Content-type: application/scim+json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data @add-service-principal.json \
| jq .

add-service-principal.json:

{
  "applicationId": "<application-client-id>",
  "displayName": "<display-name>",
  "entitlements": [
    {
      "value": "allow-cluster-create"
    }
  ],
  "groups": [
    {
      "value": "<group-id>"
    }
  ],
  "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal" ],
  "active": true
}
Postman
  1. Create a new HTTP request (File > New > HTTP Request).

  2. In the HTTP verb drop-down list, select POST.

  3. For Enter request URL, enter https://<databricks-instance-name>/api/2.0/preview/scim/v2/ServicePrincipals, where <databricks-instance-name> is your Azure Databricks workspace instance name, for example adb-1234567890123456.7.azuredatabricks.net.

  4. On the Authorization tab, in the Type list, select Bearer Token.

  5. For Token, enter your Azure Databricks personal access token for your workspace user.

  6. On the Headers tab, add the Key and Value pair of Content-Type and application/scim+json

  7. On the Body tab, select raw and JSON.

  8. Enter the following body payload:

    {
      "applicationId": "<application-client-id>",
      "displayName": "<display-name>",
      "entitlements": [
        {
          "value": "allow-cluster-create"
        }
      ],
      "groups": [
        {
          "value": "<group-id>"
        }
      ],
      "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal" ],
      "active": true
    }
    
  9. Click Send.

Create an Azure AD token for an Azure AD service principal

To create an Azure AD token for an Azure AD service principal, follow the instructions in Get an Azure AD access token with the Microsoft identity platform REST API.

After you create the Azure AD token, copy the access_token value, as you will need to provide it to your script, app, or system.

Use Terraform

Follow these instructions to use Terraform to create an Azure AD service principal in Azure, add the Azure AD service principal to your Azure Databricks workspace, and then create an Azure AD token for the Azure AD service principal.

To use curl or Postman instead of Terraform, skip to Use curl or Postman.

Requirements

Step 1: Create the Azure AD service principal

If you already have an Azure AD service principal available, skip ahead to Step 2.

  1. In your terminal, create an empty directory and then switch to it. (Each separate set of Terraform configuration files must be in its own directory.) For example: mkdir terraform_azure_service_principal_demo && cd terraform_azure_service_principal_demo.

    mkdir terraform_azure_service_principal_demo && cd terraform_azure_service_principal_demo
    
  2. In this empty directory, create a file named main.tf. Add the following content to this file, and then save the file.

    variable "azure_service_principal_display_name" {
      description = "A display name for the Azure Active Directory (Azure AD) service principal."
      type        = string
    }
    
    terraform {
      required_providers {
        azuread = {
          source  = "hashicorp/azuread"
        }
      }
    }
    
    provider "azurerm" {
      features {}
    }
    
    resource "azuread_application" "this" {
      display_name = var.azure_service_principal_display_name
    }
    
    resource "azuread_service_principal" "this" {
      application_id = azuread_application.this.application_id
    }
    
    resource "time_rotating" "month" {
      rotation_days = 30
    }
    
    resource "azuread_service_principal_password" "this" {
      service_principal_id = azuread_service_principal.this.object_id
      rotate_when_changed  = { rotation = time_rotating.month.id }
    }
    
    output "azure_client_id" {
      description = "The Azure AD service princpal's application (client) ID."
      value       = azuread_application.this.application_id
    }
    
    output "azure_client_secret" {
      description = "The Azure AD service principal's client secret value."
      value       = azuread_service_principal_password.this.value
      sensitive   = true
    }
    
  3. In the same directory, create a file named terraform.tfvars. Add the following content to this file, replacing the following value, and then save the file:

    • Replace the azure_service_principal_display_name value with a display name for the Azure AD service principal.
    azure_service_principal_display_name = "<A display name for the Azure AD service principal>"
    
  4. Initialize the working directory containing the main.tf file by running the terraform init command. For more information, see Command: init on the Terraform website.

    terraform init
    
  5. Apply the changes required to reach the desired state of the configuration by running the terraform apply command. For more information, see Command: apply on the Terraform website.

    terraform apply
    

After you create the Azure AD service principal, copy the azure_client_id and azure_client_secret output values, as you will need them later.

To get the azure_client_secret value, see the value of outputs.client_secret.value in the terraform.tfstate file, which is in the working directory containing the main.tf file.

Step 2: Add the Azure AD service principal to the Azure Databricks workspace

  1. In your terminal, create an empty directory and then switch to it. Each separate set of Terraform configuration files must be in its own directory. For example: mkdir terraform_databricks_service_principal_demo && cd terraform_databricks_service_principal_demo.

    mkdir terraform_databricks_service_principal_demo && cd terraform_databricks_service_principal_demo
    
  2. In this empty directory, create a file named main.tf. Add the following content to this file, and then save the file.

    variable "databricks_host" {
      description = "The Azure Databricks workspace URL."
      type = string
    }
    
    variable "azure_client_id" {
      type        = string
      description = "The client ID of the Azure Active Directory (Azure AD) service principal to link to an Azure Databricks service principal. This client ID will be the application ID of the Azure Databricks service principal."
    }
    
    variable "databricks_service_principal_display_name" {
      type        = string
      description = "A workspace display name for the Azure Databricks service principal."
    }
    
    terraform {
      required_providers {
        databricks = {
          source = "databricks/databricks"
        }
      }
    }
    
    provider "databricks" {
      host = var.databricks_host
    }
    
    resource "databricks_service_principal" "sp" {
      application_id = var.azure_client_id
      display_name   = var.databricks_service_principal_display_name
    }
    
    output "databricks_service_principal_application_id" {
      value       = databricks_service_principal.sp.application_id
      description = "Application ID of the Azure Databricks service principal."
    }
    
    output "databricks_service_principal_display_name" {
      value       = databricks_service_principal.sp.display_name
      description = "Workspace display name of the Azure Databricks service principal."
    }
    
    output "databricks_workspace_service_principal_id" {
      value       = databricks_service_principal.sp.id
      description = "Workspace ID of the Azure Databricks service principal. This ID is generated by Azure Databricks for this workspace."
    }
    

    Note

    To add this service principal to groups, and to add entitlements to this service principal, see databricks_service_principal on the Terraform website.

  3. In the same directory, create a file named terraform.tfvars. Add the following content to this file, replacing the following values, and then save the file:

    • Replace the databricks_host value with the URL of the Azure Databricks workspace.

      Tip

      To use environment variables instead of the terraform.tfvars file for this value, set an environment variable named TF_VAR_DATABRICKS_HOST to the URL of the Azure Databricks workspace. Also remove the databricks_host variable from main.tf as well as the reference to host in the databricks provider in main.tf.

    • Replace the azure_client_id value with the azure_client_id value from Step 1.

      Tip

      To use environment variables instead of the terraform.tfvars file for this value, set an environment variable named TF_VAR_ARM_CLIENT_ID to the Application (client) ID value from Step 1. Also remove the azure_client_id variable from main.tf as well as the application_id variable in the databricks_service_principal resource in main.tf.

    • Replace the databricks_service_principal_display_name value with a workspace display name for the Azure Databricks service principal.

    databricks_host                           = "<The Azure Databricks workspace URL, starting with https://>"
    azure_client_id                           = "<The Azure client ID of the Azure Active AD service principal>"
    databricks_service_principal_display_name = "<A workspace display name for the Azure Databricks service principal>"
    
  4. Initialize the working directory containing the main.tf file by running the terraform init command. For more information, see Command: init on the Terraform website.

    terraform init
    
  5. Apply the changes required to reach the desired state of the configuration by running the terraform apply command. For more information, see Command: apply on the Terraform website.

    terraform apply
    

After you create the Azure AD service principal, copy the databricks_service_principal_application_id output value, as you will need it to create an Azure AD token for the Azure AD service principal.

Step 3: Create an access token for an Azure AD service principal

To create an Azure AD token for an Azure AD service principal, gather the following information, and then follow the instructions in Get an Azure AD access token with the Microsoft identity platform REST API:

  • The tenant ID for your Azure AD service principal, which you will use as the Tenant ID / Directory (tenant) ID / <tenant-id> in the instructions. To get the tenant ID, see Provision a service principal in Azure portal.
  • The databricks_service_principal_application_id value from Step 2, which you will use as the Client ID / Application (client) ID / <client-id> in the instructions.
  • The azure_client_secret value from Step 1, which you will use as the Client secret / Value / <client-secret> in the instructions.

After you create the Azure AD token, copy the access_token value, as you will need to provide it to your script, app, or system.

Use the Azure CLI

Step 1: Create an Azure AD service principal

See Create an Azure service principal with the Azure CLI.

Step 2: Add the Azure AD service principal to the Azure Databricks workspace

See Add a service principal to a workspace to use the Azure Databricks account or admin console to complete this step. You cannot use the Azure CLI to add an Azure AD service principal to an Azure Databricks workspace.

Step 3: Create an access token for an Azure AD service principal

See Get an Azure AD access token with the Azure CLI.