使用適用于 Visual Studio Code 的 Databricks 擴充功能的 pytest 執行測試

發行項
03/01/2024

本文說明如何使用適用于 Visual Studio Code 的 Databricks 擴充功能來執行測試 pytest 。請參閱什麼是適用于 Visual Studio Code 的 Databricks 延伸模組？。

此資訊假設您已經安裝並設定 Visual Studio Code 的 Databricks 擴充功能。請參閱安裝適用于 Visual Studio Code 的 Databricks 擴充功能。

您可以在不需要連線到遠端 Azure Databricks 工作區中叢集的本機程式碼上執行 pytest 。例如，您可以使用 pytest 來測試函式，以在本機記憶體中接受並傳回 PySpark DataFrames。若要開始使用 pytest 並在本機執行，請參閱檔中的 pytest 開始使用。

若要在遠端 Azure Databricks 工作區中的程式碼上執行 pytest ，請在 Visual Studio Code 專案中執行下列動作：

步驟 1：建立測試

使用下列程式碼新增 Python 檔案，其中包含要執行的測試。此範例假設此檔案命名為 spark_test.py ，且位於 Visual Studio Code 專案的根目錄。此檔案包含 pytest 裝置，讓叢集的 SparkSession （叢集上的 Spark 功能進入點）可供測試使用。此檔案包含單一測試，可檢查資料表中的指定儲存格是否包含指定的值。您可以視需要將自己的測試新增至此檔案。

from pyspark.sql import SparkSession
import pytest

@pytest.fixture
def spark() -> SparkSession:
  # Create a SparkSession (the entry point to Spark functionality) on
  # the cluster in the remote Databricks workspace. Unit tests do not
  # have access to this SparkSession by default.
  return SparkSession.builder.getOrCreate()

# Now add your unit tests.

# For example, here is a unit test that must be run on the
# cluster in the remote Databricks workspace.
# This example determines whether the specified cell in the
# specified table contains the specified value. For example,
# the third column in the first row should contain the word "Ideal":
#
# +----+-------+-------+-------+---------+-------+-------+-------+------+-------+------+
# |_c0 | carat | cut   | color | clarity | depth | table | price | x    | y     | z    |
# +----+-------+-------+-------+---------+-------+-------+-------+------+-------+------+
# | 1  | 0.23  | Ideal | E     | SI2     | 61.5  | 55    | 326   | 3.95 | 3. 98 | 2.43 |
# +----+-------+-------+-------+---------+-------+-------+-------+------+-------+------+
# ...
#
def test_spark(spark):
  spark.sql('USE default')
  data = spark.sql('SELECT * FROM diamonds')
  assert data.collect()[0][2] == 'Ideal'

步驟 2：建立 pytest 執行器

使用下列程式碼新增 Python 檔案，以指示 pytest 從上一個步驟執行您的測試。此範例假設檔案已命名 pytest_databricks.py ，且位於 Visual Studio Code 專案的根目錄。

import pytest
import os
import sys

# Run all tests in the connected directory in the remote Databricks workspace.
# By default, pytest searches through all files with filenames ending with
# "_test.py" for tests. Within each of these files, pytest runs each function
# with a function name beginning with "test_".

# Get the path to the directory for this file in the workspace.
dir_root = os.path.dirname(os.path.realpath(__file__))
# Switch to the root directory.
os.chdir(dir_root)

# Skip writing .pyc files to the bytecode cache on the cluster.
sys.dont_write_bytecode = True

# Now run pytest from the root directory, using the
# arguments that are supplied by your custom run configuration in
# your Visual Studio Code project. In this case, the custom run
# configuration JSON must contain these unique "program" and
# "args" objects:
#
# ...
# {
#   ...
#   "program": "${workspaceFolder}/path/to/this/file/in/workspace",
#   "args": ["/path/to/_test.py-files"]
# }
# ...
#
retcode = pytest.main(sys.argv[1:])

步驟 3：建立自訂執行組態

若要指示 pytest 執行測試，您必須建立自訂回合組態。使用現有的 Databricks 叢集型執行組態來建立您自己的自訂執行組態，如下所示：

在主功能表上，按一下 [ 執行 > 新增組態 ]。
在命令 選擇區 中，選取 [Databricks ]。

如果這個檔案不存在，Visual Studio Code 會將 .vscode/launch.json 檔案新增至您的專案。
變更入門執行組態，如下所示，然後儲存檔案：
- 在此範例 Unit Tests (on Databricks) 中，將此回合組態的名稱從 Run on Databricks 變更為此組態的一些唯一顯示名稱。
- 從 ${file} 變更 program 為包含測試執行器之專案中的路徑，在此範例 ${workspaceFolder}/pytest_databricks.py 中為。
- 從 [] 變更 args 為包含測試檔案之專案中的路徑，在此範例 ["."] 中為。
您的 launch.json 檔案看起來應該像這樣：
```
{
  // Use IntelliSense to learn about possible attributes.
  // Hover to view descriptions of existing attributes.
  // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
  "version": "0.2.0",
  "configurations": [
    {
      "type": "databricks",
      "request": "launch",
      "name": "Unit Tests (on Databricks)",
      "program": "${workspaceFolder}/pytest_databricks.py",
      "args": ["."],
      "env": {}
    }
  ]
}
```

步驟 4：執行測試

請確定 pytest 已先安裝在叢集上。例如，在 Azure Databricks 工作區中開啟叢集的 [設定] 頁面時，請執行下列動作：

在 [ 程式庫 ] 索引標籤上，如果 顯示 pytest ，則 pytest 已安裝。如果 看不到 pytest ，請按一下 [ 安裝新的 ]。
針對 [ 程式庫來源 ]，按一下 [PyPI ]。
針對 [ 套件 ]，輸入 pytest 。
按一下 [安裝]。
等候狀態 從 [擱置 ] 變更為 [已安裝 ]。

若要執行測試，請從 Visual Studio Code 專案執行下列動作：

在主功能表上，按一下 [ 檢視 > 執行 ]。
在 [ 執行和偵 錯] 清單中，如果尚未選取單元測試，請按一下 [單元測試] （在 Databricks 上）。
按一下綠色箭號（ 開始偵錯 ）圖示。

結果會顯示在 [偵 pytest 錯主控台 ] 中（ 主功能表上的 [檢視 > 偵錯主控台 ]。 例如，這些結果顯示檔案中 spark_test.py 至少有一個測試，而點（ . ）表示找到並通過單一測試。（失敗的測試會顯示。 F

<date>, <time> - Creating execution context on cluster <cluster-id> ...
<date>, <time> - Synchronizing code to /Workspace/path/to/directory ...
<date>, <time> - Running /pytest_databricks.py ...
============================= test session starts ==============================
platform linux -- Python <version>, pytest-<version>, pluggy-<version>
rootdir: /Workspace/path/to/directory
collected 1 item

spark_test.py .                                                          [100%]

============================== 1 passed in 3.25s ===============================
<date>, <time> - Done (took 10818ms)

共用方式為

使用適用于 Visual Studio Code 的 Databricks 擴充功能的 pytest 執行測試

步驟 1：建立測試

步驟 2：建立 pytest 執行器

步驟 3：建立自訂執行組態

步驟 4：執行測試

其他資源