Test unitario per notebook

È possibile usare unit test per migliorare la qualità e la coerenza del codice dei notebook. Gli unit test sono un approccio per testare unità di codice autonome, ad esempio funzioni, in anticipo e spesso. In questo modo è possibile trovare più rapidamente i problemi del codice, individuare ipotesi sbagliate sul codice prima e semplificare le attività di codifica complessive.

Questo articolo è un'introduzione ai test unitari di base con le funzioni. Concetti avanzati, come le classi e le interfacce per il test unitario, nonché l'uso di stub , mock e test harness , anche se sono supportati nel testing unitario per i notebook, non rientrano nell'ambito di questo articolo. Questo articolo non illustra anche altri tipi di metodi di test, ad esempio test di integrazione, test di sistema, test di accettazioneo metodi di test non funzionali, ad esempio test delle prestazioni o test di usabilità.

Questo articolo illustra quanto segue:

Come organizzare le funzioni e i relativi unit test.
Come scrivere funzioni in Python, R, Scala e funzioni definite dall'utente in SQL, ben progettate per essere sottoposte a unit test.
Come chiamare queste funzioni nei notebook Python, R, Scala e SQL.
Come scrivere unit test in Python, R e Scala usando i framework di test più diffusi pytest per Python, testatat per R e ScalaTest per Scala. Inoltre, come scrivere SQL che esegue unit test delle funzioni definite dall'utente SQL (UDF SQL).
Come eseguire i test unitari dai notebook Python, R, Scala e SQL.

Nota

Azure Databricks consiglia di scrivere ed eseguire unit test in un notebook. Anche se è possibile eseguire alcuni comandi nel terminale Web, il terminale Web presenta più limitazioni, ad esempio la mancanza di supporto per Spark. Consulta Esegui comandi shell nel terminale web di Azure Databricks.

Organizza funzioni e test unità

Esistono alcuni approcci comuni per organizzare le funzioni e i relativi unit test con i notebook. Ogni approccio presenta vantaggi e sfide.

Per i notebook Python, R e Scala, gli approcci comuni includono quanto segue:

Memorizza le funzioni e i loro unit test all'esterno dei notebook..
- Vantaggi: è possibile chiamare queste funzioni con e all'esterno dei notebook. I framework di test sono progettati meglio per eseguire test all'esterno dei notebook.
- Sfide: questo approccio non è supportato per i notebook Scala. Questo approccio aumenta anche il numero di file da tenere traccia e gestire.
Archiviare le funzioni in un notebook e i relativi test unitarî in un notebook separato..
- Vantaggi: queste funzioni sono più facili da riutilizzare nei notebook.
- Sfide: il numero di notebook da tenere traccia e gestire aumenta. Queste funzioni non possono essere usate all'esterno dei notebook. Queste funzioni possono anche essere più difficili da testare all'esterno dei notebook.
Archiviare le funzioni e i relativi unit test all'interno dello stesso notebook..
- Vantaggi: le funzioni e i relativi unit test vengono archiviati all'interno di un singolo notebook per semplificare il rilevamento e la manutenzione.
- Sfide: queste funzioni possono essere più difficili da riutilizzare tra notebook. Queste funzioni non possono essere usate all'esterno dei notebook. Queste funzioni possono anche essere più difficili da testare all'esterno dei notebook.

Per i notebook Python e R, Databricks consiglia di archiviare le funzioni e i relativi unit test all'esterno dei notebook. Per i notebook Scala, Databricks consiglia di includere funzioni in un unico notebook e i relativi unit test in un notebook separato.

Per i notebook SQL, Databricks consiglia di archiviare le funzioni come funzioni definite dall'utente SQL (UDF SQL) negli schemi (noti anche come database). È quindi possibile richiamare le funzioni definite dall'utente SQL e i relativi unit test dai notebook SQL.

Scrivere funzioni

Questa sezione descrive un semplice set di funzioni di esempio che determinano quanto segue:

Indica se esiste una tabella in un database.
Indica se una colonna esiste in una tabella.
Numero di righe presenti in una colonna per un valore all'interno di tale colonna.

Queste funzioni sono concepite per essere semplici, in modo che sia possibile concentrarsi sui dettagli degli unit test in questo articolo anziché concentrarsi sulle funzioni stesse.

Per ottenere i migliori risultati degli unit test, una funzione deve restituire un singolo risultato prevedibile e essere di un singolo tipo di dati. Ad esempio, per verificare se esiste qualcosa, la funzione deve restituire un valore booleano true o false. Per restituire il numero di righe esistenti, la funzione deve restituire un numero intero non negativo. Non dovrebbe, nel primo esempio, restituire "false" se qualcosa non esiste oppure restituire la cosa stessa se esiste. Analogamente, per il secondo esempio, non deve restituire il numero di righe esistenti o false se non esistono righe.

È possibile aggiungere queste funzioni a un'area di lavoro di Azure Databricks esistente come indicato di seguito, in Python, R, Scala o SQL.

Pitone

Il codice seguente presuppone che siano state configurate le cartelle Git di Databricks, sia stato aggiunto un repository e che il repository sia aperto nell'area di lavoro di Azure Databricks.

Creare un file denominato myfunctions.py all'interno del repository e aggiungere il contenuto seguente al file. Altri esempi in questo articolo prevedono che questo file sia denominato myfunctions.py. È possibile usare nomi diversi per i propri file.

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col

# Because this file is not a Databricks notebook, you
# must create a Spark session. Databricks notebooks
# create a Spark session for you by default.
spark = SparkSession.builder \
                    .appName('integrity-tests') \
                    .getOrCreate()

# Does the specified table exist in the specified database?
def tableExists(tableName, dbName):
  return spark.catalog.tableExists(f"{dbName}.{tableName}")

# Does the specified column exist in the given DataFrame?
def columnExists(dataFrame, columnName):
  if columnName in dataFrame.columns:
    return True
  else:
    return False

# How many rows are there for the specified value in the specified column
# in the given DataFrame?
def numRowsInColumnForValue(dataFrame, columnName, columnValue):
  df = dataFrame.filter(col(columnName) == columnValue)

  return df.count()

R

Il codice seguente presuppone che siano state configurate le cartelle Git di Databricks, sia stato aggiunto un repository e che il repository sia aperto nell'area di lavoro di Azure Databricks.

Creare un file denominato myfunctions.r all'interno del repository e aggiungere il contenuto seguente al file. Altri esempi in questo articolo prevedono che questo file sia denominato myfunctions.r. È possibile usare nomi diversi per i propri file.

library(SparkR)

# Does the specified table exist in the specified database?
table_exists <- function(table_name, db_name) {
  tableExists(paste(db_name, ".", table_name, sep = ""))
}

# Does the specified column exist in the given DataFrame?
column_exists <- function(dataframe, column_name) {
  column_name %in% colnames(dataframe)
}

# How many rows are there for the specified value in the specified column
# in the given DataFrame?
num_rows_in_column_for_value <- function(dataframe, column_name, column_value) {
  df = filter(dataframe, dataframe[[column_name]] == column_value)

  count(df)
}

Linguaggio di programmazione Scala

Creare un notebook Scala denominato myfunctions con i seguenti contenuti. Altri esempi in questo articolo prevedono che questo notebook sia denominato myfunctions. È possibile usare nomi diversi per i propri notebook.

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.col

// Does the specified table exist in the specified database?
def tableExists(tableName: String, dbName: String) : Boolean = {
  return spark.catalog.tableExists(dbName + "." + tableName)
}

// Does the specified column exist in the given DataFrame?
def columnExists(dataFrame: DataFrame, columnName: String) : Boolean = {
  val nameOfColumn = null

  for(nameOfColumn <- dataFrame.columns) {
    if (nameOfColumn == columnName) {
      return true
    }
  }

  return false
}

// How many rows are there for the specified value in the specified column
// in the given DataFrame?
def numRowsInColumnForValue(dataFrame: DataFrame, columnName: String, columnValue: String) : Long = {
  val df = dataFrame.filter(col(columnName) === columnValue)

  return df.count()
}

SQL

Il codice seguente presuppone che disponiate del set di dati di esempio di terze parti diamanti, all'interno di uno schema denominato default e di un catalogo denominato main, accessibili dall'area di lavoro di Azure Databricks. Se il catalogo o lo schema da usare ha un nome diverso, modificare una o entrambe le istruzioni USE seguenti in modo che corrispondano.

Creare un notebook SQL e aggiungere il contenuto seguente a questo nuovo notebook. Quindi collegare il notebook a un cluster e eseguire il notebook per aggiungere le seguenti funzioni definite dall'utente SQL al catalogo e allo schema specificati.

Nota

Le funzioni definite dall'utente SQL table_exists e column_exists funzionano solo con il Catalogo Unity. Il supporto di funzioni definite dall'utente SQL per il Unity Catalog è disponibile in Anteprima Pubblica.

USE CATALOG main;
USE SCHEMA default;

CREATE OR REPLACE FUNCTION table_exists(catalog_name STRING,
                                        db_name      STRING,
                                        table_name   STRING)
  RETURNS BOOLEAN
  RETURN if(
    (SELECT count(*) FROM system.information_schema.tables
     WHERE table_catalog = table_exists.catalog_name
       AND table_schema  = table_exists.db_name
       AND table_name    = table_exists.table_name) > 0,
    true,
    false
  );

CREATE OR REPLACE FUNCTION column_exists(catalog_name STRING,
                                         db_name      STRING,
                                         table_name   STRING,
                                         column_name  STRING)
  RETURNS BOOLEAN
  RETURN if(
    (SELECT count(*) FROM system.information_schema.columns
     WHERE table_catalog = column_exists.catalog_name
       AND table_schema  = column_exists.db_name
       AND table_name    = column_exists.table_name
       AND column_name   = column_exists.column_name) > 0,
    true,
    false
  );

CREATE OR REPLACE FUNCTION num_rows_for_clarity_in_diamonds(clarity_value STRING)
  RETURNS BIGINT
  RETURN SELECT count(*)
         FROM main.default.diamonds
         WHERE clarity = clarity_value

Chiamare le funzioni

Questa sezione descrive il codice che chiama le funzioni precedenti. È possibile usare queste funzioni, ad esempio, per contare il numero di righe nella tabella in cui esiste un valore specificato all'interno di una colonna specificata. Tuttavia, è consigliabile verificare se la tabella esiste effettivamente e se la colonna esiste effettivamente in tale tabella prima di procedere. Il codice seguente verifica la presenza di queste condizioni.

Se sono state aggiunte le funzioni dalla sezione precedente all'area di lavoro di Azure Databricks, è possibile chiamare queste funzioni dall'area di lavoro come indicato di seguito.

Pitone

Creare un notebook Python nella stessa cartella del file di myfunctions.py precedente nel repository e aggiungere il contenuto seguente al notebook. Modificare i valori delle variabili per il nome della tabella, il nome dello schema (database), il nome della colonna e il valore della colonna in base alle esigenze. Quindi collegare il notebook a un cluster e eseguire il notebook per visualizzare i risultati.

from myfunctions import *

tableName   = "diamonds"
dbName      = "default"
columnName  = "clarity"
columnValue = "VVS2"

# If the table exists in the specified database...
if tableExists(tableName, dbName):

  df = spark.sql(f"SELECT * FROM {dbName}.{tableName}")

  # And the specified column exists in that table...
  if columnExists(df, columnName):
    # Then report the number of rows for the specified value in that column.
    numRows = numRowsInColumnForValue(df, columnName, columnValue)

    print(f"There are {numRows} rows in '{tableName}' where '{columnName}' equals '{columnValue}'.")
  else:
    print(f"Column '{columnName}' does not exist in table '{tableName}' in schema (database) '{dbName}'.")
else:
  print(f"Table '{tableName}' does not exist in schema (database) '{dbName}'.")

R

Creare un notebook R nella stessa cartella del file di myfunctions.r precedente nel repository e aggiungere il contenuto seguente al notebook. Modificare i valori delle variabili per il nome della tabella, il nome dello schema (database), il nome della colonna e il valore della colonna in base alle esigenze. Quindi collegare il notebook a un cluster e eseguire il notebook per visualizzare i risultati.

library(SparkR)
source("myfunctions.r")

table_name   <- "diamonds"
db_name      <- "default"
column_name  <- "clarity"
column_value <- "VVS2"

# If the table exists in the specified database...
if (table_exists(table_name, db_name)) {

  df = sql(paste("SELECT * FROM ", db_name, ".", table_name, sep = ""))

  # And the specified column exists in that table...
  if (column_exists(df, column_name)) {
    # Then report the number of rows for the specified value in that column.
    num_rows = num_rows_in_column_for_value(df, column_name, column_value)

    print(paste("There are ", num_rows, " rows in table '", table_name, "' where '", column_name, "' equals '", column_value, "'.", sep = "")) 
  } else {
    print(paste("Column '", column_name, "' does not exist in table '", table_name, "' in schema (database) '", db_name, "'.", sep = ""))
  }

} else {
  print(paste("Table '", table_name, "' does not exist in schema (database) '", db_name, "'.", sep = ""))
}

Linguaggio di programmazione Scala

Creare un altro notebook Scala nella stessa cartella del notebook precedente myfunctions Scala e aggiungere il contenuto seguente a questo nuovo notebook.

Nella prima cella del nuovo notebook, aggiungere il seguente codice, che chiama la funzione magica %run. Questa magia rende disponibile il contenuto del notebook myfunctions per il nuovo notebook.

%run ./myfunctions

Nella seconda cella del nuovo notebook aggiungere il codice seguente. Modificare i valori delle variabili per il nome della tabella, il nome dello schema (database), il nome della colonna e il valore della colonna in base alle esigenze. Quindi collegare il notebook a un cluster e eseguire il notebook per visualizzare i risultati.

val tableName   = "diamonds"
val dbName      = "default"
val columnName  = "clarity"
val columnValue = "VVS2"

// If the table exists in the specified database...
if (tableExists(tableName, dbName)) {

  val df = spark.sql("SELECT * FROM " + dbName + "." + tableName)

  // And the specified column exists in that table...
  if (columnExists(df, columnName)) {
    // Then report the number of rows for the specified value in that column.
    val numRows = numRowsInColumnForValue(df, columnName, columnValue)

    println("There are " + numRows + " rows in '" + tableName + "' where '" + columnName + "' equals '" + columnValue + "'.")
  } else {
    println("Column '" + columnName + "' does not exist in table '" + tableName + "' in database '" + dbName + "'.")
  }

} else {
  println("Table '" + tableName + "' does not exist in database '" + dbName + "'.")
}

SQL

Aggiungere il codice seguente a una nuova cella del notebook precedente o a una cella in un notebook separato. Modifica i nomi dello schema o del catalogo, se necessario, in modo che corrispondano ai tuoi, e poi esegui questa cella per visualizzare i risultati.

SELECT CASE
-- If the table exists in the specified catalog and schema...
WHEN
  table_exists("main", "default", "diamonds")
THEN
  -- And the specified column exists in that table...
  (SELECT CASE
   WHEN
     column_exists("main", "default", "diamonds", "clarity")
   THEN
     -- Then report the number of rows for the specified value in that column.
     printf("There are %d rows in table 'main.default.diamonds' where 'clarity' equals 'VVS2'.",
            num_rows_for_clarity_in_diamonds("VVS2"))
   ELSE
     printf("Column 'clarity' does not exist in table 'main.default.diamonds'.")
   END)
ELSE
  printf("Table 'main.default.diamonds' does not exist.")
END

Scrivere test unitari

In questa sezione viene descritto il codice che testa ognuna delle funzioni descritte all'inizio di questo articolo. Se si apportano modifiche alle funzioni in futuro, è possibile usare unit test per determinare se tali funzioni funzionano ancora come previsto.

Se all'inizio di questo articolo sono state aggiunte le funzioni all'area di lavoro di Azure Databricks, è possibile aggiungere unit test per queste funzioni all'area di lavoro come indicato di seguito.

Pitone

Creare un altro file denominato test_myfunctions.py nella stessa cartella del file myfunctions.py precedente nel repository e aggiungere il contenuto seguente al file. Per impostazione predefinita, pytest cerca .py file i cui nomi iniziano con test_ (o terminano con _test) da testare. Analogamente, per impostazione predefinita, pytest cerca all'interno di questi file funzioni i cui nomi iniziano con test_ da testare.

In generale, è consigliabile non eseguire unit test su funzioni che funzionano con i dati nell'ambiente di produzione. Ciò è particolarmente importante per le funzioni che aggiungono, rimuovono o modificano i dati. Per proteggere i dati di produzione da essere compromessi dagli unit test in modi imprevisti, è consigliabile eseguire unit test su dati non di produzione. Un approccio comune consiste nel creare dati falsi il più vicino possibile ai dati di produzione. Nell'esempio di codice seguente vengono creati dati falsi per l'esecuzione degli unit test.

import pytest
import pyspark
from myfunctions import *
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, FloatType, StringType

tableName    = "diamonds"
dbName       = "default"
columnName   = "clarity"
columnValue  = "SI2"

# Because this file is not a Databricks notebook, you
# must create a Spark session. Databricks notebooks
# create a Spark session for you by default.
spark = SparkSession.builder \
                    .appName('integrity-tests') \
                    .getOrCreate()

# Create fake data for the unit tests to run against.
# In general, it is a best practice to not run unit tests
# against functions that work with data in production.
schema = StructType([ \
  StructField("_c0",     IntegerType(), True), \
  StructField("carat",   FloatType(),   True), \
  StructField("cut",     StringType(),  True), \
  StructField("color",   StringType(),  True), \
  StructField("clarity", StringType(),  True), \
  StructField("depth",   FloatType(),   True), \
  StructField("table",   IntegerType(), True), \
  StructField("price",   IntegerType(), True), \
  StructField("x",       FloatType(),   True), \
  StructField("y",       FloatType(),   True), \
  StructField("z",       FloatType(),   True), \
])

data = [ (1, 0.23, "Ideal",   "E", "SI2", 61.5, 55, 326, 3.95, 3.98, 2.43 ), \
         (2, 0.21, "Premium", "E", "SI1", 59.8, 61, 326, 3.89, 3.84, 2.31 ) ]

df = spark.createDataFrame(data, schema)

# Does the table exist?
def test_tableExists():
  assert tableExists(tableName, dbName) is True

# Does the column exist?
def test_columnExists():
  assert columnExists(df, columnName) is True

# Is there at least one row for the value in the specified column?
def test_numRowsInColumnForValue():
  assert numRowsInColumnForValue(df, columnName, columnValue) > 0

R

Creare un altro file denominato test_myfunctions.r nella stessa cartella del file myfunctions.r precedente nel repository e aggiungere il contenuto seguente al file. Per impostazione predefinita, testthat cerca i file .r i cui nomi iniziano con test da testare.

library(testthat)
source("myfunctions.r")

table_name   <- "diamonds"
db_name      <- "default"
column_name  <- "clarity"
column_value <- "SI2"

# Create fake data for the unit tests to run against.
# In general, it is a best practice to not run unit tests
# against functions that work with data in production.
schema <- structType(
  structField("_c0",     "integer"),
  structField("carat",   "float"),
  structField("cut",     "string"),
  structField("color",   "string"),
  structField("clarity", "string"),
  structField("depth",   "float"),
  structField("table",   "integer"),
  structField("price",   "integer"),
  structField("x",       "float"),
  structField("y",       "float"),
  structField("z",       "float"))

data <- list(list(as.integer(1), 0.23, "Ideal",   "E", "SI2", 61.5, as.integer(55), as.integer(326), 3.95, 3.98, 2.43),
             list(as.integer(2), 0.21, "Premium", "E", "SI1", 59.8, as.integer(61), as.integer(326), 3.89, 3.84, 2.31))

df <- createDataFrame(data, schema)

# Does the table exist?
test_that ("The table exists.", {
  expect_true(table_exists(table_name, db_name))
})

# Does the column exist?
test_that ("The column exists in the table.", {
  expect_true(column_exists(df, column_name))
})

# Is there at least one row for the value in the specified column?
test_that ("There is at least one row in the query result.", {
  expect_true(num_rows_in_column_for_value(df, column_name, column_value) > 0)
})

Linguaggio di programmazione Scala

Creare un altro notebook Scala nella stessa cartella del notebook precedente myfunctions Scala e aggiungere il contenuto seguente a questo nuovo notebook.

Nella prima cella del nuovo notebook, aggiungi il seguente codice che richiama l'istruzione magica %run. Questa magia rende disponibile il contenuto del notebook myfunctions per il nuovo notebook.

%run ./myfunctions

Nella seconda cella aggiungere il codice seguente. Questo codice definisce gli unit test e specifica come eseguirli.

import org.scalatest._
import org.apache.spark.sql.types.{StructType, StructField, IntegerType, FloatType, StringType}
import scala.collection.JavaConverters._

class DataTests extends AsyncFunSuite {

  val tableName   = "diamonds"
  val dbName      = "default"
  val columnName  = "clarity"
  val columnValue = "SI2"

  // Create fake data for the unit tests to run against.
  // In general, it is a best practice to not run unit tests
  // against functions that work with data in production.
  val schema = StructType(Array(
                 StructField("_c0",     IntegerType),
                 StructField("carat",   FloatType),
                 StructField("cut",     StringType),
                 StructField("color",   StringType),
                 StructField("clarity", StringType),
                 StructField("depth",   FloatType),
                 StructField("table",   IntegerType),
                 StructField("price",   IntegerType),
                 StructField("x",       FloatType),
                 StructField("y",       FloatType),
                 StructField("z",       FloatType)
               ))

  val data = Seq(
                  Row(1, 0.23, "Ideal",   "E", "SI2", 61.5, 55, 326, 3.95, 3.98, 2.43),
                  Row(2, 0.21, "Premium", "E", "SI1", 59.8, 61, 326, 3.89, 3.84, 2.31)
                ).asJava

  val df = spark.createDataFrame(data, schema)

  // Does the table exist?
  test("The table exists") {
    assert(tableExists(tableName, dbName) == true)
  }

  // Does the column exist?
  test("The column exists") {
    assert(columnExists(df, columnName) == true)
  }

  // Is there at least one row for the value in the specified column?
  test("There is at least one matching row") {
    assert(numRowsInColumnForValue(df, columnName, columnValue) > 0)
  }
}

nocolor.nodurations.nostacks.stats.run(new DataTests)

Nota

Questo esempio di codice usa lo stile FunSuite di test in ScalaTest. Per altri stili di test disponibili, vedere Selezione degli stili di test per il progetto.

SQL

Prima di aggiungere unit test, è consigliabile tenere presente che, in generale, è consigliabile non eseguire unit test su funzioni che funzionano con i dati nell'ambiente di produzione. Ciò è particolarmente importante per le funzioni che aggiungono, rimuovono o modificano i dati. Per proteggere i dati di produzione da essere compromessi dagli unit test in modi imprevisti, è consigliabile eseguire unit test su dati non di produzione. Un approccio comune consiste nell'eseguire test unitari su viste anziché su tabelle.

Per creare una visualizzazione, è possibile chiamare il comando CREATE VIEW da una nuova cella nel notebook precedente o in un notebook separato. Nell'esempio seguente si presuppone che sia presente una tabella esistente denominata diamonds all'interno di uno schema (database) denominato default all'interno di un catalogo denominato main. Modificate questi nomi affinché corrispondano ai vostri in base alle esigenze, quindi eseguite solo quella cella.

USE CATALOG main;
USE SCHEMA default;

CREATE VIEW view_diamonds AS
SELECT * FROM diamonds;

Dopo aver creato la vista, aggiungi ognuna delle istruzioni SELECT seguenti a una nuova cella nel notebook precedente o in un notebook separato. Modificare i nomi in modo che corrispondano ai propri in base alle esigenze.

SELECT if(table_exists("main", "default", "view_diamonds"),
          printf("PASS: The table 'main.default.view_diamonds' exists."),
          printf("FAIL: The table 'main.default.view_diamonds' does not exist."));

SELECT if(column_exists("main", "default", "view_diamonds", "clarity"),
          printf("PASS: The column 'clarity' exists in the table 'main.default.view_diamonds'."),
          printf("FAIL: The column 'clarity' does not exists in the table 'main.default.view_diamonds'."));

SELECT if(num_rows_for_clarity_in_diamonds("VVS2") > 0,
          printf("PASS: The table 'main.default.view_diamonds' has at least one row where the column 'clarity' equals 'VVS2'."),
          printf("FAIL: The table 'main.default.view_diamonds' does not have at least one row where the column 'clarity' equals 'VVS2'."));

Eseguire i test delle unità

Questa sezione descrive come eseguire gli unit test codificati nella sezione precedente. Quando si eseguono gli unit test, si ottengono risultati che mostrano gli unit test superati e non riusciti.

Se gli unit test sono stati aggiunti dalla sezione precedente all'area di lavoro di Azure Databricks, è possibile eseguire questi unit test dall'area di lavoro. È possibile eseguire questi unit test manualmente o in base a una pianificazione.

Pitone

Creare un notebook Python nella stessa cartella del file di test_myfunctions.py precedente nel repository e aggiungere il contenuto seguente.

Nella prima cella del nuovo notebook, aggiungi il codice seguente e quindi esegui la cella, che chiama il %pip magic. Questo strumento installa pytest.

%pip install pytest

Nella seconda cella aggiungere il codice seguente e quindi eseguire la cella. I risultati mostrano gli unit test superati e non riusciti.

import pytest
import sys

# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True

# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])

# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."

R

Creare un notebook R nella stessa cartella del file di test_myfunctions.r precedente nel repository e aggiungere il contenuto seguente.

Nella prima cella aggiungere il codice seguente e quindi eseguire la cella, che chiama la funzione install.packages. Questa funzione installa testthat.

install.packages("testthat")

Nella seconda cella aggiungere il codice seguente e quindi eseguire la cella. I risultati mostrano gli unit test superati e non riusciti.

library(testthat)
source("myfunctions.r")

test_dir(".", reporter = "tap")

Linguaggio di programmazione Scala

Eseguire la prima e quindi le seconde celle del notebook dalla sezione precedente. I risultati mostrano gli unit test superati e non riusciti.

SQL

Esegui ciascuna delle tre celle nel notebook presenti nella sezione precedente. I risultati indicano se ogni unit test è stato superato o non è riuscito.

Se la visualizzazione non è più necessaria dopo l'esecuzione degli unit test, è possibile eliminare la visualizzazione. Per eliminare questa visualizzazione, è possibile aggiungere il codice seguente a una nuova cella all'interno di uno dei notebook precedenti e quindi eseguire solo tale cella.

DROP VIEW view_diamonds;

Suggerimento

È possibile visualizzare i risultati delle esecuzioni del notebook (inclusi i risultati degli unit test) nei log dei driver del cluster. È anche possibile specificare un percorso per il recapito dei log del cluster.

È possibile configurare un sistema di integrazione continua e distribuzione continua (CI/CD), ad esempio GitHub Actions, per eseguire automaticamente gli unit test ogni volta che il codice cambia. Per un esempio, vedere la copertura di GitHub Actions in Procedure consigliate di progettazione software per notebook.

Condividi tramite

Test unitario per notebook

Organizza funzioni e test unità

Scrivere funzioni

Pitone

R

Linguaggio di programmazione Scala

SQL

Chiamare le funzioni

Pitone

R

Linguaggio di programmazione Scala

SQL

Scrivere test unitari

Pitone

R

Linguaggio di programmazione Scala

SQL

Eseguire i test delle unità

Pitone

R

Linguaggio di programmazione Scala

SQL

Risorse aggiuntive

pytest

testatat

ScalaTest

SQL

Commenti e suggerimenti

Risorse aggiuntive