Eseguire test Python usando l'estensione Databricks per Visual Studio Code

Questa pagina descrive come eseguire test Python usando l'estensione Databricks per Visual Studio Code. Si veda Che cosa è l'estensione Databricks per Visual Studio Code?.

Eseguire test con pytest

È possibile eseguire pytest nel codice locale che non richiede una connessione a un cluster in un'area di lavoro remota di Azure Databricks. Ad esempio, è possibile usare pytest per testare le funzioni che accettano e restituiscono dataframe PySpark nella memoria locale. Per iniziare a usare pytest ed eseguirlo in locale, vedere Introduzione nella documentazione di pytest.

Per eseguire pytest il codice in un'area di lavoro remota di Azure Databricks, eseguire le operazioni seguenti nel progetto di Visual Studio Code:

Passaggio 1: Creare i test

Aggiungere un file Python con il codice seguente, che contiene i test da eseguire. In questo esempio si presuppone che questo file sia denominato spark_test.py e si trova nella radice del progetto di Visual Studio Code. Questo file contiene una pytestfixture, che rende SparkSession (il punto di ingresso alla funzionalità Spark nel cluster) disponibile ai test. Questo file contiene un singolo test che controlla se la cella specificata nella tabella contiene il valore specificato. È possibile aggiungere test personalizzati a questo file in base alle esigenze.

from pyspark.sql import SparkSession
import pytest

@pytest.fixture
def spark() -> SparkSession:
  # Create a SparkSession (the entry point to Spark functionality) on
  # the cluster in the remote Databricks workspace. Unit tests do not
  # have access to this SparkSession by default.
  return SparkSession.builder.getOrCreate()

# Now add your unit tests.

# For example, here is a unit test that must be run on the
# cluster in the remote Databricks workspace.
# This example determines whether the specified cell in the
# specified table contains the specified value. For example,
# the third column in the first row should contain the word "Ideal":
#
# +----+-------+-------+-------+---------+-------+-------+-------+------+-------+------+
# |_c0 | carat | cut   | color | clarity | depth | table | price | x    | y     | z    |
# +----+-------+-------+-------+---------+-------+-------+-------+------+-------+------+
# | 1  | 0.23  | Ideal | E     | SI2     | 61.5  | 55    | 326   | 3.95 | 3. 98 | 2.43 |
# +----+-------+-------+-------+---------+-------+-------+-------+------+-------+------+
# ...
#
def test_spark(spark):
  spark.sql('USE default')
  data = spark.sql('SELECT * FROM diamonds')
  assert data.collect()[0][2] == 'Ideal'

Passaggio 2: Creare l'esecutore pytest

Aggiungere un file Python con il codice seguente, che indica pytest di eseguire i test del passaggio precedente. In questo esempio si presuppone che il file sia denominato pytest_databricks.py e che si trova nella radice del progetto di Visual Studio Code.

import pytest
import os
import sys

# Run all tests in the connected directory in the remote Databricks workspace.
# By default, pytest searches through all files with filenames ending with
# "_test.py" for tests. Within each of these files, pytest runs each function
# with a function name beginning with "test_".

# Get the path to the directory for this file in the workspace.
dir_root = os.path.dirname(os.path.realpath(__file__))
# Switch to the root directory.
os.chdir(dir_root)

# Skip writing .pyc files to the bytecode cache on the cluster.
sys.dont_write_bytecode = True

# Now run pytest from the root directory, using the
# arguments that are supplied by your custom run configuration in
# your Visual Studio Code project. In this case, the custom run
# configuration JSON must contain these unique "program" and
# "args" objects:
#
# ...
# {
#   ...
#   "program": "${workspaceFolder}/path/to/this/file/in/workspace",
#   "args": ["/path/to/_test.py-files"]
# }
# ...
#
retcode = pytest.main(sys.argv[1:])

Passaggio 3: Creare una configurazione di esecuzione personalizzata

Per indicare pytest di eseguire i test, è necessario creare una configurazione di esecuzione personalizzata. Usare la configurazione di esecuzione basata su cluster Databricks esistente per creare una configurazione di esecuzione personalizzata, come indicato di seguito:

Nel menu principale fare clic su Esegui > Aggiungi configurazione.
Nella Palette dei comandi , selezionare Databricks.

Visual Studio Code aggiunge un .vscode/launch.json file al progetto, se questo file non esiste già.
Modificare la configurazione dell'esecuzione iniziale come indicato di seguito e quindi salvare il file:
- Modificare il nome della configurazione di esecuzione da Run on Databricks a un nome visualizzato univoco per questa configurazione, in questo esempio Unit Tests (on Databricks).
- Passare program da ${file} al percorso nel progetto che contiene il test runner, in questo esempio ${workspaceFolder}/pytest_databricks.py.
- Passare args da [] al percorso nel progetto che contiene i file con i test, in questo esempio ["."].
Il file launch.json avrà un aspetto simile al seguente:
```
{
  // Use IntelliSense to learn about possible attributes.
  // Hover to view descriptions of existing attributes.
  // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
  "version": "0.2.0",
  "configurations": [
    {
      "type": "databricks",
      "request": "launch",
      "name": "Unit Tests (on Databricks)",
      "program": "${workspaceFolder}/pytest_databricks.py",
      "args": ["."],
      "env": {}
    }
  ]
}
```

Passaggio 4: Eseguire i test

Assicurarsi che pytest sia già installato nel cluster. Ad esempio, con la pagina delle impostazioni del cluster aperta nell'area di lavoro di Azure Databricks, eseguire le operazioni seguenti:

Nella scheda Librerie, se pytest è visibile, pytest è già installato. Se pytest non è visibile, fare clic su Installa nuovo.
Per Origine Libreria, fai clic su PyPI.
Per Pacchetto immettere pytest.
Cliccare Installa.
Attendere che lo stato cambi da In sospeso a Installato.

Per eseguire i test, eseguire le operazioni seguenti dal progetto di Visual Studio Code:

Fare clic su Visualizza > Run dal menu principale.
Nell'elenco Esegui e debugga fare clic su Unit Tests (su Databricks), se non è già selezionato.
Fare clic sulla freccia verde (Avvia debug).

I pytest risultati vengono visualizzati nella console di debug (visualizza > console di debug nel menu principale). Ad esempio, questi risultati mostrano che almeno un test è stato trovato nel spark_test.py file e un punto (.) indica che è stato trovato e superato un singolo test. (Un test non superato mostrerebbe un F.)

<date>, <time> - Creating execution context on cluster <cluster-id> ...
<date>, <time> - Synchronizing code to /Workspace/path/to/directory ...
<date>, <time> - Running /pytest_databricks.py ...
============================= test session starts ==============================
platform linux -- Python <version>, pytest-<version>, pluggy-<version>
rootdir: /Workspace/path/to/directory
collected 1 item

spark_test.py .                                                          [100%]

============================== 1 passed in 3.25s ===============================
<date>, <time> - Done (took 10818ms)

Eseguire test con Databricks Connect

Per eseguire test in locale che usano le API Spark, usare Databricks Connect.

Passaggio 1: Configurare Databricks Connect

Seguire la procedura per configurare Databricks Connect per l'estensione. Si veda Eseguire il debug del codice con Databricks Connect per l'estensione Databricks per Visual Studio Code.

Passaggio 2: Creare un test unitario

Aggiungere un file Python con il codice seguente, che contiene il test da eseguire. In questo esempio si presuppone che questo file sia denominato main_test.py.

from my_project import main


def test_find_all_taxis():
    taxis = main.find_all_taxis()
    assert taxis.count() > 5

Passaggio 3: Aggiungere o aggiornare la configurazione di avvio debugpy

Creare quindi una debugpy configurazione di avvio che abilita Databricks Connect.

Nel menu principale di Visual Studio Code fare clic su Esegui > aggiungi configurazione.
Nel riquadro comandi selezionare Debugger Python.

Visual Studio Code aggiunge un .vscode/launch.json file al progetto, se questo file non esiste già.
Aggiungi "databricks": true campo. Ciò abilita Databricks Connect.

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Unit Tests (on Databricks)",
      "type": "debugpy",
      "databricks": true,
      "request": "launch",
      "program": "${file}",
      "args": ["."],
      "env": {},
      "console": "integratedTerminal"
    }
  ]
}

Passaggio 4: Eseguire i test

Per eseguire i test, eseguire le operazioni seguenti dal progetto di Visual Studio Code:

Nel menu principale fare clic su Visualizza > test per aprire il pannello di test.
Nel pannello di test eseguire il test facendo clic sull'icona di debug associata a main_test.py. Si noti che solo l'esecuzione del test non attiverà la configurazione di debug modificata e il codice non avrà accesso a Databricks Connect.

Commenti e suggerimenti

Questa pagina è stata utile?

Last updated on 2026-01-16

Condividi tramite

Eseguire test Python usando l'estensione Databricks per Visual Studio Code

Eseguire test con pytest

Passaggio 1: Creare i test

Passaggio 2: Creare l'esecutore pytest

Passaggio 3: Creare una configurazione di esecuzione personalizzata

Passaggio 4: Eseguire i test

Eseguire test con Databricks Connect

Passaggio 1: Configurare Databricks Connect

Passaggio 2: Creare un test unitario

Passaggio 3: Aggiungere o aggiornare la configurazione di avvio debugpy

Passaggio 4: Eseguire i test

Commenti e suggerimenti

Risorse aggiuntive