Train forecasting models with AutoML Python API

Open notebook version of this page

This example notebook shows how to train a time-series forecasting model on Databricks using the AutoML Python API. Using a COVID-19 case-count dataset, you call automl.forecast() with a 30-day daily horizon to project future case counts, then load the best model with MLflow to generate and plot forecasts.

Requirements

Databricks Runtime for Machine Learning 10.0 or above.
To save model predictions, Databricks Runtime for Machine Learning 10.5 or above.

COVID-19 dataset

The dataset contains records for the number of cases of the COVID-19 virus by date in the US, with additional geographical information. The goal is to forecast how many cases of the virus will occur over the next 30 days in the US.

import pyspark.pandas as ps
df = ps.read_csv("/databricks-datasets/COVID/covid-19-data")
df["date"] = ps.to_datetime(df['date'], errors='coerce')
df["cases"] = df["cases"].astype(int)
display(df)

AutoML training

The following command starts an AutoML run. You must provide the column that the model should predict in the target_col argument and the time column. When the run completes, you can follow the link to the best trial notebook to examine the training code.

This example also specifies:

horizon=30 to specify that AutoML should forecast 30 days into the future.
frequency="d" to specify that a forecast should be provided for each day.
primary_metric="mdape" to specify the metric to optimize for during training.

Note

automl.forecast() is only available on classic compute.

import databricks.automl
import logging

# Disable informational messages from fbprophet
logging.getLogger("py4j").setLevel(logging.WARNING)

# Note: If you are running Databricks Runtime for Machine Learning 10.4 or below, use this line instead:
# summary = databricks.automl.forecast(df, target_col="cases", time_col="date", horizon=30, frequency="d",  primary_metric="mdape")

summary = databricks.automl.forecast(df, target_col="cases", time_col="date", horizon=30, frequency="d",  primary_metric="mdape", output_database="default")

Iterate on the model

Explore the notebooks and experiments linked above.
If the metrics for the best trial notebook look good, you can continue with the next cell.
If you want to improve on the model generated by the best trial:
- Go to the notebook with the best trial and clone it.
- Edit the notebook as necessary to improve the model.
- When you are satisfied with the model, note the URI where the artifact for the trained model is logged. Assign this URI to the model_uri variable in the next cell.

Show the predicted results from the best model

Note: This section requires Databricks Runtime for Machine Learning 10.5 or above.

Load predictions from the best model

In Databricks Runtime for Machine Learning 10.5 or above, if output_database is provided, AutoML saves the predictions from the best model.

# Load the saved predictions.
forecast_pd = spark.table(summary.output_table_name)
display(forecast_pd)

Use the model for forecasting

You can use the commands in this section with Databricks Runtime for Machine Learning 10.0 or above.

Load the model with MLflow

MLflow allows you to easily import models back into Python by using the AutoML trial_id .

import mlflow.pyfunc
from mlflow.tracking import MlflowClient

run_id = MlflowClient()
trial_id = summary.best_trial.mlflow_run_id

model_uri = "runs:/{run_id}/model".format(run_id=trial_id)
pyfunc_model = mlflow.pyfunc.load_model(model_uri)

Use the model to make forecasts

Call the predict_timeseries model method to generate forecasts.
In Databricks Runtime for Machine Learning 10.5 or above, you can set include_history=False to get the predicted data only.

forecasts = pyfunc_model._model_impl.python_model.predict_timeseries()
display(forecasts)

# Option for Databricks Runtime for Machine Learning 10.5 or above
# forecasts = pyfunc_model._model_impl.python_model.predict_timeseries(include_history=False)

Plot the forecasted points

In the plot below, the thick black line shows the time series dataset, and the blue line is the forecast created by the model.

df_true = df.groupby("date").agg(y=("cases", "avg")).reset_index().to_pandas()

import matplotlib.pyplot as plt

fig = plt.figure(facecolor='w', figsize=(10, 6))
ax = fig.add_subplot(111)
forecasts = pyfunc_model._model_impl.python_model.predict_timeseries(include_history=True)
fcst_t = forecasts['ds'].dt.to_pydatetime()
ax.plot(df_true['date'].dt.to_pydatetime(), df_true['y'], 'k.', label='Observed data points')
ax.plot(fcst_t, forecasts['yhat'], ls='-', c='#0072B2', label='Forecasts')
ax.fill_between(fcst_t, forecasts['yhat_lower'], forecasts['yhat_upper'],
                color='#0072B2', alpha=0.2, label='Uncertainty interval')
ax.legend()
plt.show()

Register and deploy the model

You can register and deploy a model trained by AutoML like any other model in the MLflow Model Registry. See Log, load, and register MLflow models.

Troubleshooting: `No module named pandas.core.indexes.numeric`

When serving an AutoML-trained model with Mosaic AI Model Serving, you may see the error No module named pandas.core.indexes.numeric. This happens when the pandas version used by AutoML differs from the one in the model serving endpoint environment. To resolve:

Download the add-pandas-dependency.py script. The script edits requirements.txt and conda.yaml for the logged model to pin pandas==1.5.3.
Edit the script to include the run_id of the MLflow run where the model was logged.
Re-register the model.
Serve the new model version.

Example notebook

Train forecasting models with AutoML Python API

Get notebook

Next steps

AutoML Python API reference.

Feedback

Was this page helpful?

Last updated on 2026-05-01

Train forecasting models with AutoML Python API

Requirements

COVID-19 dataset

AutoML training

Iterate on the model

Show the predicted results from the best model

Load predictions from the best model

Use the model for forecasting

Load the model with MLflow

Use the model to make forecasts

Plot the forecasted points

Register and deploy the model

Troubleshooting: No module named pandas.core.indexes.numeric

Example notebook

Train forecasting models with AutoML Python API

Next steps

Feedback

Additional resources

Troubleshooting: `No module named pandas.core.indexes.numeric`