How to use: AutoML Forecast with Feature Store (Databricks Runtime 15.4 ML LTS & 16.4 ML LTS)

Staedel, Oliver 0 Reputation points
2025-06-20T13:31:37.5366667+00:00

We are using AutoML Forecast with a time series dataset that includes temporal covariates from the Feature Store (e.g., a corona_dummy feature). We leverage feature_store_lookups with lookup_key and timestamp_lookup_key.

đź”§ Feature Table Definition

fs.create_table(
  name="...features.example_corona_features",
  primary_keys=["Monat", "Produkt", "Vertriebstyp_Art"],
  df=...,
  timestamp_keys="Monat",
  ...
)

🚀 AutoML Call

feature_store_lookups = [{
  "table_name": "...features.example_corona_features",
  "lookup_key": ["Produkt", "Vertriebstyp_Art"],
  "timestamp_lookup_key": "Monat"
}]

âś… Expected Behavior

AutoML should perform a temporal join between the dataset and the feature table (using timestamp and keys), and proceed with training including the corona_dummy covariate.


❌ Actual Behavior

AutoML starts the run but fails during internal execution of applyInPandas() or .toPandas() with the following error:

ValueError: Length mismatch: Expected axis has 8 elements, new values have 11 elements

This crash occurs after the feature join and dataset loading — i.e., during AutoML’s internal training loop.


🔍 Observations

When we remove feature_store_lookups, AutoML completes without errors.

The issue only appears when the timestamp column (Monat) is:

part of the primary_keys, and

 passed again as `timestamp_lookup_key`.

âť“ Question

Can you confirm whether this is a known issue?

What is the correct usage contract for feature_store_lookups with timestamp_lookup_key in AutoML?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Staedel, Oliver 0 Reputation points
    2025-06-23T17:09:39.4633333+00:00

    @Chandra Boorla Thank you again for your support.

    Just to clarify the root of the issue: even though I only define "Produkt" as the primary key when creating the feature table, Databricks automatically promotes the timestamp column ("Monat") to be a primary key as well — simply because it's specified as a timestamp_keys during table creation.

    You can see this in the attached screenshot:

    Produkt is marked as PK

    Monat is marked as PK(TS) – i.e., timestamp and also implicitly primary key

    This happens even though I explicitly exclude Monat from the primary_keys list. As a result, during AutoML processing with feature_store_lookups, the Monat column gets duplicated (once from the feature table, once from the training dataset), which causes the schema mismatch error — just as you described in your earlier message.

    So while your recommended contract makes sense logically, the underlying platform behavior enforces a PK constraint on the timestamp column, and this seems to conflict with AutoML's expectations in the join.

    Do you know of any way to prevent timestamp_keys from becoming a primary key, or is this behavior currently hardcoded?

    Thanks again for helping clarify this edge case!

    User's image


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.