How to use: AutoML Forecast with Feature Store (Databricks Runtime 15.4 ML LTS & 16.4 ML LTS)

Question

How to use: AutoML Forecast with Feature Store (Databricks Runtime 15.4 ML LTS & 16.4 ML LTS)

Staedel, Oliver 0

We are using AutoML Forecast with a time series dataset that includes temporal covariates from the Feature Store (e.g., a corona_dummy feature). We leverage feature_store_lookups with lookup_key and timestamp_lookup_key.

🔧 Feature Table Definition

fs.create_table(
  name="...features.example_corona_features",
  primary_keys=["Monat", "Produkt", "Vertriebstyp_Art"],
  df=...,
  timestamp_keys="Monat",
  ...
)

🚀 AutoML Call

feature_store_lookups = [{
  "table_name": "...features.example_corona_features",
  "lookup_key": ["Produkt", "Vertriebstyp_Art"],
  "timestamp_lookup_key": "Monat"
}]

✅ Expected Behavior

AutoML should perform a temporal join between the dataset and the feature table (using timestamp and keys), and proceed with training including the corona_dummy covariate.

❌ Actual Behavior

AutoML starts the run but fails during internal execution of applyInPandas() or .toPandas() with the following error:

ValueError: Length mismatch: Expected axis has 8 elements, new values have 11 elements

This crash occurs after the feature join and dataset loading — i.e., during AutoML’s internal training loop.

🔍 Observations

When we remove feature_store_lookups, AutoML completes without errors.

The issue only appears when the timestamp column (Monat) is:

part of the primary_keys, and

 passed again as `timestamp_lookup_key`.

❓ Question

Can you confirm whether this is a known issue?

What is the correct usage contract for feature_store_lookups with timestamp_lookup_key in AutoML?

Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-20T20:24:27.7066667+00:00
@Staedel, Oliver

Thank you for the detailed description, it’s very helpful in understanding the issue.

Based on what you’ve shared, the error seems to stem from using the Monat column both as a primary_key in the Feature Store table and again as the timestamp_lookup_key in the feature_store_lookups. This likely results in the column being duplicated during the join, which causes the schema mismatch error you're seeing during AutoML’s internal processing.

This specific behavior isn’t currently listed as a known issue in the documentation.

Usage Guidelines:

When using temporal features with Databricks AutoML Forecasting, here are the key usage rules to follow for feature_store_lookups:

Join Keys Must Match - The columns defined in lookup_key and timestamp_lookup_key must exist in both your training dataset and the feature store table to enable a proper join.

Avoid Duplicating the Timestamp Column - If you’re using a timestamp column (like Monat) as a timestamp_lookup_key, it should not also be included in the primary_keys when creating the feature store table. Including it in both can cause column duplication during the temporal join, leading to errors like: ValueError: Length mismatch: Expected axis has 8 elements, new values have 11 elements

One Row per Key + Timestamp - The feature store table should return one matching row per combination of lookup_key and timestamp_lookup_key. AutoML expects a 1:1 match, multiple rows can cause unexpected behavior or join errors.

Feature Table Registration - Ensure the feature table is properly registered in Unity Catalog and accessible to AutoML.

Suggested Structure:

When creating the feature table, define it like this:

fs.create_table( name="...features.example_corona_features", primary_keys=["Produkt", "Vertriebstyp_Art"], timestamp_keys="Monat", df=... )

And reference it in AutoML like this:

feature_store_lookups = [{ "table_name": "...features.example_corona_features", "lookup_key": ["Produkt", "Vertriebstyp_Art"], "timestamp_lookup_key": "Monat" }]

If you follow this contract, AutoML should be able to complete the feature join and proceed with training as expected.

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Thank you.
Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-23T16:49:08.9933333+00:00

@Staedel, Oliver

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

1 answer

Your answer

Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-23T16:49:08.9933333+00:00

@Staedel, Oliver

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

Staedel, Oliver 0

@Chandra Boorla Thank you again for your support.

Just to clarify the root of the issue: even though I only define "Produkt" as the primary key when creating the feature table, Databricks automatically promotes the timestamp column ("Monat") to be a primary key as well — simply because it's specified as a timestamp_keys during table creation.

You can see this in the attached screenshot:

Produkt is marked as PK

Monat is marked as PK(TS) – i.e., timestamp and also implicitly primary key

This happens even though I explicitly exclude Monat from the primary_keys list. As a result, during AutoML processing with feature_store_lookups, the Monat column gets duplicated (once from the feature table, once from the training dataset), which causes the schema mismatch error — just as you described in your earlier message.

So while your recommended contract makes sense logically, the underlying platform behavior enforces a PK constraint on the timestamp column, and this seems to conflict with AutoML's expectations in the join.

Do you know of any way to prevent timestamp_keys from becoming a primary key, or is this behavior currently hardcoded?

Thanks again for helping clarify this edge case!

User's image

Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-23T18:04:33.4066667+00:00
@Staedel, Oliver

Thanks for the detailed follow-up and for sharing the screenshot, that helps confirm the behavior you're observing.

You're absolutely right - when a column is specified as a timestamp_keys during fs.create_table(), Databricks automatically promotes it to a primary key, even if it’s not explicitly listed in the primary_keys parameter. The PK(TS) label in your screenshot clearly shows that Monat is treated as both a primary key and a timestamp.

This implicit promotion unfortunately leads to the issue you're seeing, where Monat gets duplicated during AutoML’s internal processing (once from the training dataset, and once from the joined feature set), resulting in the schema mismatch error.

Current Understanding:

This behavior is hardcoded in the Feature Store implementation today.

There’s currently no way to prevent a timestamp_keys column from being treated as part of the primary key.

Workaround - A practical workaround would be to manually perform the temporal join outside of AutoML:

Use the Feature Store APIs or Spark SQL to perform the join between your base dataset and the feature table.

Then pass the fully enriched dataset directly to AutoML without using feature_store_lookups.

This approach avoids the internal duplication logic and gives you more control over the resulting schema.

Since this behavior causes issues with a standard AutoML use case, it may be worth raising this as a feature request.

Appreciate if you could share the feedback on our feedback channel. Which would be open for the user community to upvote & comment on. This allows our product teams to effectively prioritize your request against our existing feature backlog and gives insight into the potential impact of implementing the suggested feature.

I hope this information helps.
Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-24T18:47:59.0666667+00:00

@Staedel, Oliver

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Share via

How to use: AutoML Forecast with Feature Store (Databricks Runtime 15.4 ML LTS & 16.4 ML LTS)

1 answer

Your answer