Share via


Materialize and serve declarative features

Important

This feature is Beta and is available in the following regions: us-east-1 and us-west-2.

After you have created your declarative feature definitions, which are stored in Unity Catalog, you can produce feature data from your source table using the feature definitions. This process is called materializing your features. Azure Databricks creates and manages Lakeflow Spark Declarative Pipelines to populate tables in Unity Catalog for model training and batch scoring or online serving.

Requirements

  • Features must be created with the declarative feature API and stored in Unity Catalog.
  • For version requirements, see Requirements.

API data structures

OfflineStoreConfig

Configuration for the offline store where materialized features will be written. The materialization pipelines create new tables in this store.

OfflineStoreConfig(
    catalog_name: str,        # Catalog name for the offline table where materialized features will be stored
    schema_name: str,         # Schema name for the offline table
    table_name_prefix: str    # Table name prefix for the offline table. The pipeline may create multiple tables with this prefix, each updated at different cadences
)
from databricks.feature_engineering.entities import OfflineStoreConfig

offline_store = OfflineStoreConfig(
    catalog_name="main",
    schema_name="feature_store",
    table_name_prefix="customer_features"
)

OnlineStoreConfig

Configuration for the online store, which stores features used by model serving. Materialization creates Delta tables with the catalog.schema.table_name_prefix, and streams the tables to Lakebase tables with the same name.

from databricks.feature_engineering.entities import OnlineStoreConfig

online_store = OnlineStoreConfig(
    catalog_name="main",
    schema_name="feature_store",
    table_name_prefix="customer_features_serving",
    online_store_name="customer_features_store"
)

MaterializedFeature

Represents a declarative feature that has been materialized, that is, that has a precomputed representation available in Unity Catalog. There is a distinct materialized feature for the offline table and online table. Typically, users will not instantiate a MaterializedFeature directly.

API function calls

materialize_features()

Materializes a list of declarative features into either an offline Delta table or to an Online Feature Store.

FeatureEngineeringClient.materialize_features(
    features: List[Feature],                                               # List of declarative features to materialize
    offline_config: OfflineStoreConfig,                                    # Offline store config if materializing offline
    online_config: Optional[OnlineStoreConfig] = None,                     # Online store config if materializing online
    pipeline_state: Union[MaterializedFeaturePipelineScheduleState, str],  # Materialization pipeline state - currently must be "ACTIVE"
    cron_schedule: Optional[str] = None,                                   # Materialization schedule, specified in quartz cron syntax. Currently must be provided.
) -> List[MaterializedFeature]:

The method returns a list of materialized features, which contain metadata such as cron schedule when feature values are updated and information about the Unity Catalog tables where the features are materialized.

If both an OnlineStoreConfig and an OfflineStoreConfig are provided, then two materialized features are returned per feature provided, one for each type of store.

list_materialized_features()

Returns a list of all materialized features in the user's Unity Catalog metastore.

By default, a maximum of 100 features are returned. You can change this limit using the max_results parameter.

To filter the returned materialized features by a feature name, use the optional feature_name parameter.

FeatureEngineeringClient.list_materialized_features(
    feature_name: Optional[str] = None,     # Optional feature name to filter by
    max_results: int = 100,                 # Maximum number of features to be returned
) -> List[MaterializedFeature]:

How to delete a materialized feature

To delete a materialized feature, use list_materialized_features(). Check the table_name attribute, navigate to that table in Unity Catalog, and delete the table containing the feature. Use the Lineage tab to identify any associated pipelines and delete them as well. Finally, ensure that for online tables, the offline pipeline and table are also deleted.

In beta, deletion APIs are not supported. If needed, you can manually delete feature pipelines and feature tables via the Databricks UI.

Use online features in real-time applications

To serve features to real-time applications and services, create a feature serving endpoint. See Feature Serving endpoints.

Models that are trained using features from Databricks automatically track lineage to the features they were trained on. When deployed as endpoints, these models use Unity Catalog to find appropriate features in online stores. For details, see Use features in online workflows.

Limitations

  • Continuous features cannot be materialized.
  • You can only work with materialized features in the workspace in which they were created.
  • Deleting and pausing a feature must be manually managed at the pipeline level.