Provider Databricks Terraform

HashiCorp Terraform è uno strumento open source diffuso per la creazione di un'infrastruttura cloud sicura e prevedibile in diversi provider di servizi cloud. È possibile usare il provider Databricks Terraform per gestire le aree di lavoro di Azure Databricks e l'infrastruttura cloud associata usando uno strumento flessibile e potente. L'obiettivo del provider Databricks Terraform è supportare tutte le API REST di Databricks, ovvero l'automazione degli aspetti più complessi della distribuzione e gestione delle piattaforme dati. I clienti di Databricks usano il provider Databricks Terraform per distribuire e gestire cluster e processi e per configurare l'accesso ai dati. Usi il provider di Azure per effettuare il provisioning delle aree di lavoro di Azure Databricks.

Iniziare

In questa sezione vengono installati e configurati i requisiti per l'uso di Terraform e del provider Databricks Terraform nel computer di sviluppo locale. Configurare quindi l'autenticazione Terraform. Seguendo questa sezione, questo articolo fornisce una configurazione di esempio con cui è possibile provare a eseguire il provisioning di un notebook, un cluster e un processo di Azure Databricks per eseguire il notebook nel cluster in un'area di lavoro di Azure Databricks esistente.

Requisiti

È necessario disporre della CLI di Terraform. Consulta il Download di Terraform sul sito Web di Terraform.
È necessario avere un progetto Terraform. Nel terminale creare una directory vuota e quindi passare a essa. Ogni set separato di file di configurazione terraform deve trovarsi nella propria directory, denominata progetto Terraform . Ad esempio: mkdir terraform_demo && cd terraform_demo.
```
mkdir terraform_demo && cd terraform_demo
```
Includere configurazioni Terraform per il progetto in uno o più file di configurazione nel progetto Terraform. Per informazioni sulla sintassi del file di configurazione, vedere La documentazione sul linguaggio Terraform nel sito Web Terraform.
È necessario aggiungere al progetto Terraform una dipendenza per il provider Databricks Terraform. Aggiungere quanto segue a uno dei file di configurazione nel progetto Terraform:
```
terraform {
  required_providers {
    databricks = {
      source = "databricks/databricks"
    }
  }
}
```
È necessario configurare l'autenticazione per il progetto Terraform. Vedere Autenticazione nella documentazione del provider Terraform di Databricks.

Configurazione di esempio

Questa sezione fornisce una configurazione di esempio con cui puoi sperimentare per fornire un notebook Azure Databricks, un cluster e un processo che esegue il notebook sul cluster, in un'area di lavoro Azure Databricks esistente. Si presuppone che siano già stati configurati i requisiti di , nonché che sia stato creato un progetto Terraform e configurato il progetto con l'autenticazione terraform, come descritto nella sezione precedente.

Creare un file denominato me.tf nel progetto Terraform e aggiungere il codice seguente. Questo file ottiene informazioni sull'utente corrente (l'utente):
```
# Retrieve information about the current user.
data "databricks_current_user" "me" {}
```

Creare un altro file denominato notebook.tf e aggiungere il seguente codice. Questo file rappresenta il notebook.

variable "notebook_subdirectory" {
  description = "A name for the subdirectory to store the notebook."
  type        = string
  default     = "Terraform"
}

variable "notebook_filename" {
  description = "The notebook's filename."
  type        = string
}

variable "notebook_language" {
  description = "The language of the notebook."
  type        = string
}

resource "databricks_notebook" "this" {
  path     = "${data.databricks_current_user.me.home}/${var.notebook_subdirectory}/${var.notebook_filename}"
  language = var.notebook_language
  source   = "./${var.notebook_filename}"
}

output "notebook_url" {
 value = databricks_notebook.this.url
}

Creare un altro file denominato notebook.auto.tfvars e aggiungere il seguente codice. Questo file specifica le proprietà del notebook.
```
notebook_subdirectory = "Terraform"
notebook_filename     = "notebook-getting-started.py"
notebook_language     = "PYTHON"
```
Creare un altro file denominato notebook-getting-started.py e aggiungere il seguente codice. Questo file rappresenta il contenuto del notebook.
```
display(spark.range(10))
```

Creare un altro file denominato cluster.tf e aggiungere il seguente codice. Questo file rappresenta il cluster.

variable "cluster_name" {
  description = "A name for the cluster."
  type        = string
  default     = "My Cluster"
}

variable "cluster_autotermination_minutes" {
  description = "How many minutes before automatically terminating due to inactivity."
  type        = number
  default     = 60
}

variable "cluster_num_workers" {
  description = "The number of workers."
  type        = number
  default     = 1
}

# Create the cluster with the "smallest" amount
# of resources allowed.
data "databricks_node_type" "smallest" {
  local_disk = true
}

# Use the latest Databricks Runtime
# Long Term Support (LTS) version.
data "databricks_spark_version" "latest_lts" {
  long_term_support = true
}

resource "databricks_cluster" "this" {
  cluster_name            = var.cluster_name
  node_type_id            = data.databricks_node_type.smallest.id
  spark_version           = data.databricks_spark_version.latest_lts.id
  autotermination_minutes = var.cluster_autotermination_minutes
  num_workers             = var.cluster_num_workers
}

output "cluster_url" {
 value = databricks_cluster.this.url
}

Creare un altro file denominato cluster.auto.tfvars e aggiungere il seguente codice. Questo file specifica le proprietà del cluster.
```
cluster_name                    = "My Cluster"
cluster_autotermination_minutes = 60
cluster_num_workers             = 1
```

Creare un altro file denominato job.tf e aggiungere il seguente codice. Questo file rappresenta il processo che esegue il notebook nel cluster.

variable "job_name" {
  description = "A name for the job."
  type        = string
  default     = "My Job"
}

variable "task_key" {
  description = "A name for the task."
  type        = string
  default     = "my_task"
}

resource "databricks_job" "this" {
  name = var.job_name
  task {
    task_key = var.task_key
    existing_cluster_id = databricks_cluster.this.cluster_id
    notebook_task {
      notebook_path = databricks_notebook.this.path
    }
  }
  email_notifications {
    on_success = [ data.databricks_current_user.me.user_name ]
    on_failure = [ data.databricks_current_user.me.user_name ]
  }
}

output "job_url" {
  value = databricks_job.this.url
}

Creare un altro file denominato job.auto.tfvars e aggiungere il seguente codice. Questo file specifica le proprietà dei job.
```
job_name = "My Job"
task_key = "my_task"
```
Eseguire terraform plan. Se sono presenti errori, correggerli e quindi eseguire di nuovo il comando.
Eseguire terraform apply.
Verificare che il notebook, il cluster e il processo siano stati creati: nell'output del comando terraform apply trovare gli URL di notebook_url, cluster_url e job_url e accedervi.
Eseguire il processo: nella pagina Processi fare clic su Esegui ora. Al termine del lavoro, controlla la tua casella di posta elettronica.
Al termine di questo esempio, eliminare il notebook, il cluster e il processo dall'area di lavoro di Databricks di Azure eseguendo terraform destroy.

Nota

Per ulteriori informazioni sui comandi terraform plan, terraform apply e terraform destroy, vedere la Terraform CLI Documentation nella documentazione di Terraform.
Verificare che il notebook, il cluster e il processo siano stati eliminati: aggiornare le pagine notebook, cluster e Processi per visualizzare ogni messaggio che la risorsa non è stata trovata.

Esecuzione del test

Testare le configurazioni di Terraform prima o dopo la distribuzione. È possibile eseguire test analoghi agli unit test prima di distribuire le risorse. È anche possibile eseguire test analoghi ai test di integrazione dopo la distribuzione delle risorse. Vedere Test nella documentazione di Terraform.

Eseguire test analoghi ai test di integrazione rispetto alla configurazione di esempio di questo articolo seguendo questo processo:

Creare un file denominato cluster.tftest.hcl e aggiungere il seguente codice. Questo file verifica se il cluster distribuito ha il nome del cluster previsto.

# Filename: cluster.tftest.hcl

run "cluster_name_test" {
  command = apply

  assert {
    condition     = databricks_cluster.this.cluster_name == var.cluster_name
    error_message = "Cluster name did not match expected name"
  }
}

Creare un file denominato job.tftest.hcl e aggiungere il seguente codice. Questo file verifica se il lavoro distribuito ha il nome previsto.

run "job_name_test" {
  command = apply

  assert {
    condition     = databricks_job.this.name == var.job_name
    error_message = "Job name did not match expected name"
  }
}

Creare un file denominato notebook.tftest.hcl e aggiungere il seguente codice. Questo file verifica se il notebook distribuito ha il percorso previsto dell'area di lavoro.

run "notebook_path_test" {
  command = apply

  assert {
    condition     = databricks_notebook.this.path == "${data.databricks_current_user.me.home}/${var.notebook_subdirectory}/${var.notebook_filename}"
    error_message = "Notebook path did not match expected path"
  }
}

Eseguire terraform test. Terraform distribuisce ogni risorsa nell'area di lavoro di Databricks di Azure, esegue ogni test correlato e ne segnala il risultato e quindi rimuove la risorsa distribuita.

Eseguire test analoghi agli unit test rispetto alla configurazione di esempio di questo articolo con il processo seguente:

Modificare la riga command = apply in ognuno dei test precedenti in command = plan e quindi eseguire terraform test. Terraform esegue ogni test correlato e ne segnala il risultato, ma non distribuisce alcuna risorsa.
Simulare il provider Databricks Terraform, che consente di eseguire terraform test senza distribuire risorse e senza richiedere credenziali di autenticazione. Consultare Mocks (Simulazioni) nella documentazione di Terraform. Per eseguire test fittizi, un approccio consiste nell'aggiungere la riga mock_provider "databricks" {} ai test e rimuovere la riga command = apply o command = plan, ad esempio:

# Filename: cluster.tftest.hcl

mock_provider "databricks" {}

run "cluster_mock_name_test" {
  assert {
    condition     = databricks_cluster.this.cluster_name == var.cluster_name
    error_message = "Cluster name did not match expected name"
  }
}

# Filename: job.tftest.hcl

mock_provider "databricks" {}

run "job_mock_name_test" {
  assert {
    condition     = databricks_job.this.name == var.job_name
    error_message = "Job name did not match expected name"
  }
}

# Filename: notebook.tftest.hcl

mock_provider "databricks" {}

run "notebook_mock_path_test" {
  assert {
    condition     = databricks_notebook.this.path == "${data.databricks_current_user.me.home}/${var.notebook_subdirectory}/${var.notebook_filename}"
    error_message = "Notebook path did not match expected path"
  }
}

Passaggi successivi

Creare un'area di lavoro di Azure Databricks.

Risorse aggiuntive

Documentazione del provider Databricks nel sito Web del Registro Terraform
Documentazione di Terraform sul sito Web terraform
Repository GitHub degli esempi di Databricks Terraform

Commenti e suggerimenti

Questa pagina è stata utile?

Last updated on 2026-02-25