Databricks Terraform provider

HashiCorp Terraform is a popular open source tool for creating safe and predictable cloud infrastructure across several cloud providers. You can use the Databricks Terraform provider to manage your Azure Databricks workspaces and the associated cloud infrastructure using a flexible, powerful tool. The goal of the Databricks Terraform provider is to support all Databricks REST APIs, supporting automation of the most complicated aspects of deploying and managing your data platforms. Databricks customers are using the Databricks Terraform provider to deploy and manage clusters and jobs and to configure data access. You use the Azure Provider to provision Azure Databricks workspaces.

Getting started

In this section, you install and configure requirements to use Terraform and the Databricks Terraform provider. You then configure Terraform authentication. Following this section, this article provides a sample configuration that you can experiment with to provision an Azure Databricks notebook, cluster, and a job to run the notebook on the cluster, in an existing Azure Databricks workspace.

Requirements

To use Terraform to create resources at the Azure account level, and to use the Databricks Terraform provider to create resources at the Azure Databricks account level, you must have the following:

To use the Databricks Terraform provider to also create resources at the Azure Databricks workspace level, you must have the following:

Configure Terraform authentication

In your Terraform project, you must create a configuration to authenticate Terraform with your Azure account, and to authenticate the Databricks Terraform provider with your Azure Databricks account and your Azure Databricks workspace, as follows:

  1. In your terminal, create an empty directory and then switch to it. (Each separate set of Terraform configuration files must be in its own directory, which is called a Terraform project.) For example: mkdir terraform_demo && cd terraform_demo.

    mkdir terraform_demo && cd terraform_demo
    
  2. In this empty directory, create a file named auth.tf. Add the following content to this file, depending on your authentication method, and then save the file.

    Tip

    If you use Visual Studio Code, the HashiCorp Terraform extension for Visual Studio Code adds editing features for Terraform files such as syntax highlighting, IntelliSense, code navigation, code formatting, a module explorer, and much more.

    To use the Azure CLI to authenticate at the Azure account level and at the Azure Databricks account level, and to use a Databricks CLI configuration profile to authenticate at the Azure Databricks workspace level, add the following content:

    variable "databricks_connection_profile" {}
    
    terraform {
      required_providers {
        azurerm = {
          source = "hashicorp/azurerm"
        }
        databricks = {
          source = "databricks/databricks"
        }
      }
    }
    
    provider "azurerm" {
      features {}
    }
    
    # Use Databricks CLI authentication.
    provider "databricks" {
      profile = var.databricks_connection_profile
    }
    

    To use the Azure CLI to authenticate at the Azure account level, the Azure Databricks account level, and the Azure Databricks workspace level, add the following content instead:

    variable "databricks_host" {}
    
    terraform {
      required_providers {
        azurerm = {
          source = "hashicorp/azurerm"
        }
        databricks = {
          source = "databricks/databricks"
        }
      }
    }
    
    provider "azurerm" {
      features {}
    }
    
    # Use Azure CLI authentication.
    provider "databricks" {
      host = var.databricks_host
    }
    

    To use the Azure CLI to authenticate at the Azure account level and at the Azure Databricks account level, and to use environment variables to authenticate at the Azure Databricks workspace level, add the following content instead:

    terraform {
      required_providers {
        azurerm = {
          source = "hashicorp/azurerm"
        }
        databricks = {
          source = "databricks/databricks"
        }
      }
    }
    
    provider "azurerm" {
      features {}
    }
    
    # Use environment variables for authentication.
    provider "databricks" {}
    

    Tip

    If you want to create resources only at the Databricks workspace level, you can remove the azurerm block from any of the preceding required_providers declarations along with the provider "azurerm" declaration.

  3. If you use a Databricks CLI configuration profile or the Azure CLI to authenticate at the Azure Databricks workspace level, create another file named auth.auto.tfvars, add the following content to the file, and change the value as needed.

    Tip

    *.auto.tfvars files enable you to specify variable values separately from your code. This makes your .tf files more modular and reusable across different usage scenarios.

    If you use a Databricks CLI configuration profile to authenticate at the Azure Databricks workspace level, add the following content:

    databricks_connection_profile = "DEFAULT"
    

    If you use the Azure CLI to authenticate at the Azure Databricks workspace level, add the following content instead:

    databricks_host = "https://<workspace-instance-name>"
    
  4. Initialize the working directory containing the auth.tf file by running the terraform init command. For more information, see Command: init on the Terraform website.

    terraform init
    

    Terraform downloads the specified providers and installs them in a hidden subdirectory of your current working directory, named .terraform. The terraform init command prints out which version of the providers were installed. Terraform also creates a lock file named .terraform.lock.hcl which specifies the exact provider versions used, so that you can control when you want to update the providers used for your project.

  5. Check whether your project was configured correctly by running the terraform plan command. If there are any errors, fix them, and run the command again. For more information, see Command: plan on the Terraform website.

    terraform plan
    
  6. Apply the changes required to reach the desired state of the configuration by running the terraform apply command. For more information, see Command: apply on the Terraform website.

    terraform apply
    

    Because no resources have yet been specified in the auth.tf file, the output is Apply complete! Resources: 0 added, 0 changed, 0 destroyed. Also, Terraform writes data into a file called terraform.tfstate. To create resources, continue with Sample configuration, Next steps, or both to specify the desired resources to create, and then run the terraform apply command again. Terraform stores the IDs and properties of the resources it manages in this terraform.tfstate file, so that it can update or destroy those resources going forward.

Sample configuration

This section provides a sample configuration that you can experiment with to provision an Azure Databricks notebook, a cluster, and a job to run the notebook on the cluster, in an existing Azure Databricks workspace. It assumes that you have already set up the requirements, as well as created a Terraform project and configured the project with Terraform authentication as described in the previous section.

  1. Create another file named me.tf in the same directory that you created in Configure Terraform authentication, and add the following code. This file gets information about the current user (you):

    # Retrieve information about the current user.
    data "databricks_current_user" "me" {}
    
  2. Create another file named notebook.tf, and add the following code. This file represents the notebook.

    variable "notebook_subdirectory" {
      description = "A name for the subdirectory to store the notebook."
      type        = string
      default     = "Terraform"
    }
    
    variable "notebook_filename" {
      description = "The notebook's filename."
      type        = string
    }
    
    variable "notebook_language" {
      description = "The language of the notebook."
      type        = string
    }
    
    resource "databricks_notebook" "this" {
      path     = "${data.databricks_current_user.me.home}/${var.notebook_subdirectory}/${var.notebook_filename}"
      language = var.notebook_language
      source   = "./${var.notebook_filename}"
    }
    
    output "notebook_url" {
     value = databricks_notebook.this.url
    }
    
  3. Create another file named notebook.auto.tfvars, and add the following code. This file specifies the notebook’s properties.

    notebook_subdirectory = "Terraform"
    notebook_filename     = "notebook-getting-started.py"
    notebook_language     = "PYTHON"
    
  4. Create another file named notebook-getting-started.py, and add the following code. This file represents the notebook’s contents.

    display(spark.range(10))
    
  5. Create another file named cluster.tf, and add the following code. This file represents the cluster.

    variable "cluster_name" {
      description = "A name for the cluster."
      type        = string
      default     = "My Cluster"
    }
    
    variable "cluster_autotermination_minutes" {
      description = "How many minutes before automatically terminating due to inactivity."
      type        = number
      default     = 60
    }
    
    variable "cluster_num_workers" {
      description = "The number of workers."
      type        = number
      default     = 1
    }
    
    # Create the cluster with the "smallest" amount
    # of resources allowed.
    data "databricks_node_type" "smallest" {
      local_disk = true
    }
    
    # Use the latest Databricks Runtime
    # Long Term Support (LTS) version.
    data "databricks_spark_version" "latest_lts" {
      long_term_support = true
    }
    
    resource "databricks_cluster" "this" {
      cluster_name            = var.cluster_name
      node_type_id            = data.databricks_node_type.smallest.id
      spark_version           = data.databricks_spark_version.latest_lts.id
      autotermination_minutes = var.cluster_autotermination_minutes
      num_workers             = var.cluster_num_workers
    }
    
    output "cluster_url" {
     value = databricks_cluster.this.url
    }
    
  6. Create another file named cluster.auto.tfvars, and add the following code. This file specifies the cluster’s properties.

    cluster_name                    = "My Cluster"
    cluster_autotermination_minutes = 60
    cluster_num_workers             = 1
    
  7. Create another file named job.tf, and add the following code. This file represents the job that runs the notebook on the cluster.

    variable "job_name" {
      description = "A name for the job."
      type        = string
      default     = "My Job"
    }
    
    resource "databricks_job" "this" {
      name = var.job_name
      existing_cluster_id = databricks_cluster.this.cluster_id
      notebook_task {
        notebook_path = databricks_notebook.this.path
      }
      email_notifications {
        on_success = [ data.databricks_current_user.me.user_name ]
        on_failure = [ data.databricks_current_user.me.user_name ]
      }
    }
    
    output "job_url" {
      value = databricks_job.this.url
    }
    
  8. Create another file named job.auto.tfvars, and add the following code. This file specifies the jobs’s properties.

    job_name = "My Job"
    
  9. Run terraform plan. If there are any errors, fix them, and then run the command again.

  10. Run terraform apply.

  11. Verify that the notebook, cluster, and job were created: in the output of the terraform apply command, find the URLs for notebook_url, cluster_url, and job_url, and go to them.

  12. Run the job: on the Jobs page, click Run Now. After the job finishes, check your email inbox.

  13. When you are done with this sample, delete the notebook, cluster, and job from the Azure Databricks workspace by running terraform destroy.

  14. Verify that the notebook, cluster, and job were deleted: refresh the notebook, cluster, and Jobs pages to each display a message that the resource cannot be found.

Next steps

  1. Create an Azure Databricks workspace.
  2. Manage workspace resources for an Azure Databricks workspace.

Troubleshooting

Note

For Terraform-specific support, see the Latest Terraform topics on the HashiCorp Discuss website. For issues specific to the Databricks Terraform Provider, see Issues in the databrickslabs/terraform-provider-databricks GitHub repository.

Error: Failed to install provider

Issue: If you did not check in a terraform.lock.hcl file to your version control system, and you run the terraform init command, the following message appears: Failed to install provider. Additional output may include a message similar to the following:

Error while installing databrickslabs/databricks: v1.0.0: checksum list has no SHA-256 hash for "https://github.com/databricks/terraform-provider-databricks/releases/download/v1.0.0/terraform-provider-databricks_1.0.0_darwin_amd64.zip"

Cause: Your Terraform configurations reference outdated Databricks Terraform providers.

Solution:

  1. Replace databrickslabs/databricks with databricks/databricks in all of your .tf files.

    To automate these replacements, run the following Python command from the parent folder that contains the .tf files to update:

    python3 -c "$(curl -Ls https://dbricks.co/updtfns)"
    
  2. Run the following Terraform command and then approve the changes when prompted:

    terraform state replace-provider databrickslabs/databricks databricks/databricks
    

    For information about this command, see Command: state replace-provider in the Terraform documentation.

  3. Verify the changes by running the following Terraform command:

    terraform init
    

Error: Failed to query available provider packages

Issue: If you did not check in a terraform.lock.hcl file to your version control system, and you run the terraform init command, the following message appears: Failed to query available provider packages.

Cause: Your Terraform configurations reference outdated Databricks Terraform providers.

Solution: Follow the solution instructions in Error: Failed to install provider.

Additional examples

Additional resources