Feature governance and lineage

This page describes the governance and lineage capabilities of feature engineering in Unity Catalog.

For information about monitoring the performance of a served model and changes in feature table data, see Lakehouse Monitoring.

Control access to feature tables

Access control for feature tables in Unity Catalog is managed by Unity Catalog. See Unity Catalog privileges.

View feature table, function, and model lineage

When you log a model using FeatureEngineeringClient.log_model, the features used in the model are automatically tracked and can be viewed in the Lineage tab of Catalog Explorer. In addition to feature tables, Python UDFs that are used to compute on-demand features are also tracked.

How to capture lineage of a feature table, function, or model

Lineage information tracking feature tables and functions used in models is automatically captured when you call log_model. See the following example code.

from databricks.feature_engineering import FeatureEngineeringClient, FeatureLookup, FeatureFunction
fe = FeatureEngineeringClient()

features = [
    FeatureLookup(
        table_name = "main.on_demand_demo.restaurant_features",
        feature_names = ["latitude", "longitude"],
        rename_outputs={"latitude": "restaurant_latitude", "longitude": "restaurant_longitude"},
        lookup_key = "restaurant_id",
        timestamp_lookup_key = "ts"
    ),
    FeatureFunction(
        udf_name="main.on_demand_demo.extract_user_latitude",
        output_name="user_latitude",
        input_bindings={"blob": "json_blob"},
    ),
    FeatureFunction(
        udf_name="main.on_demand_demo.extract_user_longitude",
        output_name="user_longitude",
        input_bindings={"blob": "json_blob"},
    ),
    FeatureFunction(
        udf_name="main.on_demand_demo.haversine_distance",
        output_name="distance",
        input_bindings={"x1": "restaurant_longitude", "y1": "restaurant_latitude", "x2": "user_longitude", "y2": "user_latitude"},
    )
]

training_set = fe.create_training_set(
    label_df, feature_lookups=features, label="label", exclude_columns=["restaurant_id", "json_blob", "restaurant_latitude", "restaurant_longitude", "user_latitude", "user_longitude", "ts"]
)

class IsClose(mlflow.pyfunc.PythonModel):
    def predict(self, ctx, inp):
        return (inp['distance'] < 2.5).values

model_name = "fe_packaged_model"
mlflow.set_registry_uri("databricks-uc")

fe.log_model(
    IsClose(),
    model_name,
    flavor=mlflow.pyfunc,
    training_set=training_set,
    registered_model_name=registered_model_name
)

View the lineage of a feature table, model, or function

To view the lineage of a feature table, model, or function, follow these steps:

  1. Navigate to the table, model version, or function page in Catalog Explorer.

  2. Select the Lineage tab. The left sidebar shows Unity Catalog components that were logged with this table, model version, or function.

    Lineage tab on model page in Catalog Explorer

  3. Click See lineage graph. The lineage graph appears. For details about exploring the lineage graph, see Capture and explore lineage.

    lineage screen

  4. To close the lineage graph, click close button for lineage graph in the upper-right corner.