使用 AutoML Python API 將預測模型定型

開啟此頁面的筆記本版本

本範例筆記本展示了如何在 Databricks 上使用 AutoML Python API 訓練時間序列預測模型。使用 COVID-19 病例數據集，你呼叫 automl.forecast() 設定 30 天的每日預測範圍以預測未來病例數，然後用 MLflow 載入最佳模型來生成並繪製預測圖。

需求

Databricks 執行環境用於機器學習 10.0 或以上版本。
為了儲存模型預測，請使用Databricks Runtime for 機器學習 10.5或更高版本。

COVID-19 資料集

該資料集包含美國各日期的 COVID-19 病例數紀錄，並附有更多地理資訊。目標是預測未來30天內美國將發生多少病例。

import pyspark.pandas as ps
df = ps.read_csv("/databricks-datasets/COVID/covid-19-data")
df["date"] = ps.to_datetime(df['date'], errors='coerce')
df["cases"] = df["cases"].astype(int)
display(df)

AutoML 訓練

以下指令啟動 AutoML 執行。你必須在 target_col 參數中指定模型應預測的欄位和時間欄。當執行完成後，你可以點選最佳試驗筆記本的連結來檢視訓練代碼。

此範例亦明確指出：

horizon=30 指定 AutoML 應預測未來 30 天。
frequency="d" 指定每天應提供天氣預報。
primary_metric="mdape" 指定訓練中要優化的指標。

注意

automl.forecast() 僅在經典運算系統中提供。

import databricks.automl
import logging

# Disable informational messages from fbprophet
logging.getLogger("py4j").setLevel(logging.WARNING)

# Note: If you are running Databricks Runtime for Machine Learning 10.4 or below, use this line instead:
# summary = databricks.automl.forecast(df, target_col="cases", time_col="date", horizon=30, frequency="d",  primary_metric="mdape")

summary = databricks.automl.forecast(df, target_col="cases", time_col="date", horizon=30, frequency="d",  primary_metric="mdape", output_database="default")

反覆迭代模型

探索上面連結的筆記本和實驗。
如果最佳試驗筆記本的指標看起來不錯，你可以繼續下一個程式碼區塊。
如果你想改進由最佳試驗產生的模型：
- 去找有最佳試煉的筆記本複製它。
- 必要時編輯筆記本以改進模型。
- 當你對模型感到滿意時，注意記錄訓練後模型的成果所在位置的 URI。將此 URI 指派到 model_uri 下一個儲存格的變數。

顯示最佳模型的預測結果

註：本節需 Databricks 機器學習 10.5 以上版本的執行環境。

最佳模型的負載預測

在 Databricks 執行時（機器學習 10.5 或以上版本）中，若提供 output_database，AutoML 會儲存最佳模型的預測。

# Load the saved predictions.
forecast_pd = spark.table(summary.output_table_name)
display(forecast_pd)

使用模型進行預測

你可以在 Databricks Runtime for 機器學習 10.0 或以上版本中使用本節的指令。

將模型載入 MLflow。

MLflow 讓你能輕鬆地使用 AutoML trial_id 將模型匯入 Python。

import mlflow.pyfunc
from mlflow.tracking import MlflowClient

run_id = MlflowClient()
trial_id = summary.best_trial.mlflow_run_id

model_uri = "runs:/{run_id}/model".format(run_id=trial_id)
pyfunc_model = mlflow.pyfunc.load_model(model_uri)

利用模型來做預測

呼叫 predict_timeseries 模型方法來產生預測。
在機器學習 10.5 或以上版本的 Databricks Runtime 中，你可以設定 include_history=False，只取得預測資料。

forecasts = pyfunc_model._model_impl.python_model.predict_timeseries()
display(forecasts)

# Option for Databricks Runtime for Machine Learning 10.5 or above
# forecasts = pyfunc_model._model_impl.python_model.predict_timeseries(include_history=False)

繪製預測點

在下方的圖表中，粗黑線顯示時間序列資料集，藍線則是模型所建立的預報。

df_true = df.groupby("date").agg(y=("cases", "avg")).reset_index().to_pandas()

import matplotlib.pyplot as plt

fig = plt.figure(facecolor='w', figsize=(10, 6))
ax = fig.add_subplot(111)
forecasts = pyfunc_model._model_impl.python_model.predict_timeseries(include_history=True)
fcst_t = forecasts['ds'].dt.to_pydatetime()
ax.plot(df_true['date'].dt.to_pydatetime(), df_true['y'], 'k.', label='Observed data points')
ax.plot(fcst_t, forecasts['yhat'], ls='-', c='#0072B2', label='Forecasts')
ax.fill_between(fcst_t, forecasts['yhat_lower'], forecasts['yhat_upper'],
                color='#0072B2', alpha=0.2, label='Uncertainty interval')
ax.legend()
plt.show()

註冊並部署模型

你可以像其他模型一樣，在 MLflow 模型登錄表中註冊並部署由 AutoML 訓練的模型。請參閱日誌、載入與暫存 MLflow 模型。

故障排除： `No module named pandas.core.indexes.numeric`

當使用 Mosaic AI 模型服務來部署 AutoML 訓練模型時，你可能會看到錯誤 No module named pandas.core.indexes.numeric。當 pandas AutoML 使用的版本與服務端點環境的模型版本不同時，就會發生這種情況。解決方法：

下載 add-pandas-dependency.py 腳本。腳本會編輯requirements.txt和conda.yaml以釘選已記錄的pandas==1.5.3模型。
編輯腳本，加入記錄模型的 MLflow 執行的 run_idID。
重新註冊模型。
發布新版本模式。

範例筆記本

使用 AutoML Python API 將預測模型定型

取得筆記本

下一步

AutoML Python API 參考。

意見反應

此頁面對您有幫助嗎？

Last updated on 2026-05-03

使用 AutoML Python API 將預測模型定型

需求

COVID-19 資料集

AutoML 訓練

反覆迭代模型

顯示最佳模型的預測結果

最佳模型的負載預測

使用模型進行預測

將模型載入 MLflow。

利用模型來做預測

繪製預測點

註冊並部署模型

故障排除： No module named pandas.core.indexes.numeric

範例筆記本

使用 AutoML Python API 將預測模型定型

下一步

意見反應

其他資源

故障排除： `No module named pandas.core.indexes.numeric`