Comparing popular feature stores

Article
01/30/2024

This article provides an overview and side-by-side comparison of some better-known feature store solutions, including FeaSt, Databricks FS, Feathr, and Azure's Managed Feature Store.

Feature store summaries

Here are brief summaries of each feature store solution.

FeaSt

FeaSt is an open-source Feature Store created by GoJek.

It focuses on providing a Feature Registry for sharing features and a Feature Serving layer to provide point-in-time joins support and abstract the queries to access the features from the datastore. FeaSt expects that you bring your own data warehouse and doesn’t provide support for feature transformation. It also expects that your data transformation has been completed and persisted beforehand.

Databricks FS

Databricks FS is a proprietary Feature Store solution provided within the Databricks environment. It makes use of the Delta Lake file system that Databricks uses to provide a solution that works with different data versions. It only works within Databricks and integration with other data sources is not available.

Feathr

Feathr is an open-source Feature Store created by LinkedIn and Microsoft.

In terms of functionality, Feathr provides a Feature Registry, support for Feature Transformation through its built-in functions, and the functionality to share features across teams.

Feathr runs the feature computation on Spark against incoming data from multiple sources. It supports different storage systems to persist that data after it has been processed for consumption at training or inference time.

Azure Managed Feature Store

The Managed Feature Store redefines the ML experience and enables ML professionals to develop and produce features from prototyping to operationalizing. Has capability for monitoring features, supports network isolation Private Link and managed Vnet, and can be used as part of Azure AML or other custom ML platforms.

Comparison

Provided below is a comparison of some of these feature store solutions.

Onboarding

Category	FeaSt	Databricks FS	Feathr	Managed Feature Store
Installation	FeaSt is used as an SDK from the client’s code by installing a package and running some commands locally: Quickstart - Feast. When deploying in a cloud environment, please refer to: https://docs.feast.dev/how-to-guides/feast-snowflake-gcp-aws	Databricks has a feature store initialized on the creation of a Databricks cluster. From there, you need to create a Feature Store Object. You can use the Python API to create it. From here, you can use the Feature Store functionalities. Note: Azure DB docs	Feathr documentation on how to install Feathr. feathr/quickstart.md at main · linkedin/feathr (github.com). Has documentation on how to provision an Azure resource - feathr/azure-deployment.md at main · linkedin/feathr (github.com)	The Feature Store is part of the Azure Machine Learning workspace. Check the Prerequisites on what you need to get started with Managed Feature Store
Engineering Onboarding	Several tutorials are provided: Quickstart provided Quickstart - Feast. Tutorials provided such as: (Overview - Feast) 1. Fraud detection on GCP Fraud detection on GCP - Feast 2. Driver ranking Driver ranking - Feast 3. Real-time credit scoring on AWS Real-time credit scoring on AWS - Feast 4. Driver stats on Snowflake Driver stats on Snowflake - Feast 5. Validating historical features with Great Expectations Validating historical features with Great Expectations - Feast	Quickstart notebook - Feature Store Taxi example notebook - Databricks (microsoft.com)	Quickstart needs to provision Azure resources 1. QuickStart - feathr/quickstart.md at main · linkedin/feathr (github.com) Azure provisioning - feathr/azure-deployment.md at main · linkedin/feathr (github.com) Demo notebook - feathr/nyc_driver_demo.ipynb at main · linkedin/feathr (github.com) Documentation page - Feathr – An Enterprise-Grade, High Performance Feature Store	Managed Feature Store Documentation
Azure support	The Azure ML Product Group has created a feast-azure provider plugin Azure/feast-azure: Azure plugins for Feast (FEAture STore) (github.com)	Support via Azure Databricks Azure Databricks	LinkedIn and Microsoft have open-sourced Feathr and facilitated a quick start guide for Azure.	The Managed Feature Store has full Azure support.

Capabilities

Category	FeaSt	Databricks FS	Feathr	Managed Feature Store
Supported Data Sources	Currently supported offline stores - S3 - Snowflake - BigQuery - Redshift - Azure SQL DB and/or Synapse SQL (Azure/feast-azure: Azure plugins for Feast (FEAture STore) (github.com))	Databricks Delta tables. No other supported offline stores.	- ADLS (Azure Data Lake Storage) - Azure Blob Storage - Azure SQL DB - Azure Synapse Dedicated SQL Pools - Azure SQL in VM - Amazon S3 - SnowFlake - Kafka streaming	ADLS (Azure Data Lake Storage)
Supported Offline Store For Feature Transformation	Not supported.	Databricks Delta tables	- ADLS (Azure Data Lake Storage) - Azure Storage Blob - Amazon S3 - Delta Lake	ADLS (Azure Data Lake Storage)
Supported Online Stores	Currently supported online stores (Online stores - Feast) - SQLite (locally) - Redis - Datastore - DynamoDB	Managed SQL services from Azure/AWS are options: - Azure SQL DB - Azure Database for MySQL - Amazon Aurora (MySQL-compatible) - Amazon RDS MySQL As of April 2022, Redis is not supported.	- Redis - CosmosDB - SQL	Redis
Read from Multiple Data Sources	FeaSt only supports one data source per namespace. For one feature store definition you might have multiple namespaces, but can access one namespace at a time per FeaSt SDK instance. You can mix multiple instances and merge data in your code, but there might be some associated performance tax.	A data source refers to the DB File System location (if any) being used. Thus, multiple data sources can be maintained, however, they will have to exist in the Databricks Filesystem.	When working on the Feature Transformation code, you can define multiple sources and combine the data for processing.	At this point multiple data sources are not supported. Multiple data sources are planned for a later release.
Feature Definition Versioning	No Feature Definition Versioning support – there is no way to update Features without deleting them.	Feature definition versioning is not implemented yet. The only feature versioning that it provides is related to the feature value, using the built-in capability of Delta Lake to track versions. However, if you would like to change a particular feature definition in a feature table, then you need to create a new feature table.	Not yet supported, but it is on the roadmap	Supports versioning. Feature sets are versioned and immutable.
Feature Metadata Storage and Querying	All metadata for feature definition is stored in a binary registry file and accessible via the command-line interface or SDK. - Metadata is just a dictionary. All metadata is immutable for Feature Views/Features. - Can list all Entities. Can list all Feature Views (feature groups). Can query by entity + feature view + feature. - Metadata on Feature Views (feature groups), Entities, and on Features. - Note: Can query the FeaSt registered concepts of the above. For example if I have a feature set where the driver's ID is my entity, FeaSt will tell me that the driver_id column is the entity. It will not return the values of the entities that are needed to query the data	The feature definition metadata is available in the Feature Store UI. - Versions produced for feature. - Notebook and schedule for which feature data is produced. - Raw data used. - Online stores that a feature is linked to	The feature definition metadata is persisted in Purview and accessible using Feathr’s SDK or Purview’s directly. One advanced feature that Feathr provides by using Purview is the ability to get lineage information for features. The Purview lineage shows what is involved in generating the feature. Shows the lineage on which Feathr project, the process to execute it. What anchors and anchor processes are executed to produce the feature.
Feature Dataset Lifecycle	FeaSt expects that all your features’ values have been created and stored in the offline store beforehand. The only step left for using FeaSt is for the user to create and apply the feature views and services (as code) for their usage.	All data is first ingested into the Databricks File System. From here, you can pull the data and apply data transformations. Finally, they are put into a delta table to be used as the offline store. Feature tables will keep track of a list of related features. There could be multiple notebooks, or data transformation logic pipelines pushing to the same feature table if necessary.	Feature Engineering: Feathr allows you to explore your data, create transformations, and persist your data in new tables in your offline datastore.	Raw data stored in ADLS can be transformed and stored in offline stores. Supported formats are parquet and csv.
Time Travel Support	Can retrieve features with a specified feature value data timestamp.	Can retrieve features with a specified feature value data timestamp.	Includes support for point-in-time joins.	Supports feature retrieval with specified feature value data timestamp.
Lineage tracing	- Provides audit functionality. Audit logs are low level and getting data level lineage will require extra work. - FeaSt expects already processed data to be added to the feature store. All tracking from raw data to feature store must be built separately - Deleting individual ingested records is not supported.	Lineage is provided through a feature store UI on the Databricks platform	Lineage can be seen in Azure Purview	Purview is a metadata catalogue where you can see the relationships between a model and feature sets. There is an integration with Purview coming in a future release. At this point, lineage information can be accessed from the Feature Store UI for every model.
APIs and Protocol supported	Python client library lags behind REST API. Also offers Java and Go client libraries	Python API with examples of every function is presented on the Databricks docs. Since the feature store is created on the Databricks cluster infrastructure, our python code would have to have access to the Databricks cluster. The recommended way is to use notebooks on the DB cluster to ensure we have access to the resources. Online stores are designed for native access. As of April 2022, Databricks feature store is primarily geared towards batch inference.	FeathrCli and FeathrClient (Python)	Serving API users can look up features for training batch inference and online inference. Serving api can pull the data out of a source directly or materialized storage like an offline/online store.
Data validation	Not native: Validating historical features with Great Expectations - Feast	Not native: Validating historical features with Great Expectations. Great Expectations tutorial
Management/Storage of feature data	-	Feature data is all stored in Databricks Delta tables. These delta tables are created by the user and are used as an offline store.	- Feature data can be accessed/seen in Purview - Feathr API has some functionality to retrieve this information	No visual capabilities for that, however, the feature store API SDK allows you to retrieve the feature data.
Feature Definition Discovery capabilities	An interface for access to the feature definitions and metadata has been added to the roadmap. For now, accessing the metadata information is possible using a CLI command.	Can filter and search through feature store UI. Text search for features, feature tables, tags is currently available. Can create and edit tags to be assigned to different feature tables. These tags can then be used to search for a subset of features. For example, having a tag called ‘dev’ would show someone all dev feature tables.	Feathr provides an optional UI web application for search and discovery of features from the registry.	Feature Catalogue to search and reuse features.
Azure ML integration	Integration of FeaSt within Azure ML is possible through some work as documented here by the Azure ML PG.	No in-built functionality; need to pull from online store for inference. Databricks has an ML platform of its own, so it has links to that rather than to Azure ML now.	The Azure ML notebook (run Feathr on notebook) is supported for now. In the future, integrate with Azure ML compute.	Available to use from Azure ML and custom ML platforms.

Observability

Category	FeaSt	Databricks FS	Feathr	Managed Feature Store
Metrics	No built-in metrics capability, can use Prometheus to scrape metrics from FeaSt Core and Serving.	Databricks Machine Learning has some metrics on the experiments and models being run but MLFlow is used by them as well to track the machine learning model itself.	No built-in capability: - Can use Prometheus to scrape metrics. synapse/metrics-howto.md at master · matrix-org/synapse (github.com) - Can configure to send the diagnostics to Azure Log Analytics. Monitor Azure Synapse Analytics Using Log Analytics (c-sharpcorner.com)
Tracing	- Python SDK uses python logging libraries. - The FeaSt services themselves do not log anything in their pods running in kubernetes	Databricks has integrations with MLFlow that allow us to log models and track deployments, etc. Databricks Feature Store uses MLFlow as a tool to enhance the ML Workflow in the example notebook linked above as well.	Can view logs on Synapse	When it comes to the materialization process, there is logging for the scheduled process. The job record created can be visualized in the UI. The user can query on that. Logs will be available as part of the job. Whenever a user does CRUD operations, creates assets, there will be information logged.

Other categories

Category	FeaSt	Databricks FS	Feathr	Managed Feature Store
Transformations	In FeaSt, any data transformation is expected to happen before the features get created.	Transformations can be done in a notebook as is the standard SDLC.	Enabled via built-in transformations: - feathr/feature-definition.md at main · linkedin/feathr (github.com)	Feature transformation capabilities: local development testing, support for Pyspark/Spark sql based transformationsFeature transformation capabilities: fully local development testing, support for Pyspark/Spark sql based transformations
Access control and authorization	Not implemented.	Permissions are enabled that can give access to only certain Databricks feature tables depending on role. Note, these roles in Databricks are different from Azure RBAC policies currently and cannot be synched.	A project level role-based access control (RBAC) plugin is available to help you manage who has access to the Feathr Registry. It provides a simple authorization system built on OAuth tokens, along with an SQL database as backend storage, for user role records. More information about Feathr Registry Access Control	Azure role based access control is used to manage the access to the resources. Access control for managed feature store.
Open source/manage	Open-source	Proprietary technology maintained by Databricks	Open source – maintained by LinkedIn	Not open source at the moment.
Other Limitations		1. Databricks Feature Store APIs support batch scoring of models packaged with Feature Store. Online inference is not supported. - 2. Databricks Feature Store does not support deleting individual features from a feature table. A new feature table would have to be created instead.		Currently in public preview and not recommended for production workloads since certain features might not be supported.

Share via