Defining your data model (Preview)

Warning

The Semantic Kernel Vector Store functionality is in preview, and improvements that require breaking changes may still occur in limited circumstances before release.

Overview

The Semantic Kernel Vector Store connectors use a model first approach to interacting with databases.

All methods to upsert or get records use strongly typed model classes. The properties on these classes are decorated with attributes that indicate the purpose of each property.

Tip

For an alternative to using attributes, refer to defining your schema with a record definition.

Tip

For an alternative to defining your own data model, refer to using Vector Store abstractions without defining your own data model.

Here is an example of a model that is decorated with these attributes.

using Microsoft.Extensions.VectorData;

public class Hotel
{
    [VectorStoreRecordKey]
    public ulong HotelId { get; set; }

    [VectorStoreRecordData(IsFilterable = true)]
    public string HotelName { get; set; }

    [VectorStoreRecordData(IsFullTextSearchable = true)]
    public string Description { get; set; }

    [VectorStoreRecordVector(4, DistanceFunction.CosineDistance, IndexKind.Hnsw)]
    public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }

    [VectorStoreRecordData(IsFilterable = true)]
    public string[] Tags { get; set; }
}

Attributes

VectorStoreRecordKeyAttribute

Use this attribute to indicate that your property is the key of the record.

[VectorStoreRecordKey]
public ulong HotelId { get; set; }

VectorStoreRecordKeyAttribute parameters

Parameter Required Description
StoragePropertyName No Can be used to supply an alternative name for the property in the database. Note that this parameter is not supported by all connectors, e.g. where alternatives like JsonPropertyNameAttribute is supported.

Tip

For more information on which connectors support StoragePropertyName and what alternatives are available, refer to the documentation for each connector.

VectorStoreRecordDataAttribute

Use this attribute to indicate that your property contains general data that is not a key or a vector.

[VectorStoreRecordData(IsFilterable = true)]
public string HotelName { get; set; }

VectorStoreRecordDataAttribute parameters

Parameter Required Description
IsFilterable No Indicates whether the property should be indexed for filtering in cases where a database requires opting in to indexing per property. Default is false.
IsFullTextSearchable No Indicates whether the property should be indexed for full text search for databases that support full text search. Default is false.
StoragePropertyName No Can be used to supply an alternative name for the property in the database. Note that this parameter is not supported by all connectors, e.g. where alternatives like JsonPropertyNameAttribute is supported.

Tip

For more information on which connectors support StoragePropertyName and what alternatives are available, refer to the documentation for each connector.

VectorStoreRecordVectorAttribute

Use this attribute to indicate that your property contains a vector.

[VectorStoreRecordVector(Dimensions: 4, DistanceFunction.CosineDistance, IndexKind.Hnsw)]
public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }

VectorStoreRecordVectorAttribute parameters

Parameter Required Description
Dimensions Yes for collection create, optional otherwise The number of dimensions that the vector has. This is typically required when creating a vector index for a collection.
IndexKind No The type of index to index the vector with. Default varies by vector store type.
DistanceFunction No The type of distance function to use when doing vector comparison during vector search over this vector. Default varies by vector store type.
StoragePropertyName No Can be used to supply an alternative name for the property in the database. Note that this parameter is not supported by all connectors, e.g. where alternatives like JsonPropertyNameAttribute is supported.

Common index kinds and distance function types are supplied as static values on the Microsoft.SemanticKernel.Data.IndexKind and Microsoft.SemanticKernel.Data.DistanceFunction classes. Individual Vector Store implementations may also use their own index kinds and distance functions, where the database supports unusual types.

Tip

For more information on which connectors support StoragePropertyName and what alternatives are available, refer to the documentation for each connector.

All methods to upsert or get records use a class and a vector store record definition.

This can be done by defining your own class with annotations for the fields, or by using a class/type in combination with a record definition. Two things need to be done for a class, the first is to add the annotations with the field types, the second is to decorate the class with the vectorstoremodel decorator.

Tip

For the alternative approach using a record definition, refer to definining your schema with a record definition.

Here is an example of a model that is decorated with these annotations.

from dataclasses import dataclass, field
from typing import Annotated
from semantic_kernel.data import (
    DistanceFunction,
    IndexKind,
    VectorStoreRecordDataField,
    VectorStoreRecordDefinition,
    VectorStoreRecordKeyField,
    VectorStoreRecordVectorField,
    vectorstoremodel,
)

@vectorstoremodel
@dataclass
class Hotel:
    hotel_id: Annotated[str, VectorStoreRecordKeyField()] = field(default_factory=lambda: str(uuid4()))
    hotel_name: Annotated[str, VectorStoreRecordDataField(is_filterable=True)]
    description: Annotated[str, VectorStoreRecordDataField(is_full_text_searchable=True)]
    description_embedding: Annotated[list[float], VectorStoreRecordVectorField(dimensions=4, distance_function=DistanceFunction.COSINE, index_kind=IndexKind.HNSW)]
    tags: Annotated[list[str], VectorStoreRecordDataField(is_filterable=True)]

Tip

Defining a class with these annotations can be done in multiple ways, one of which is using the dataclasses module in Python, shown here. This sample shows other approaches (using Pydantic BaseModels and vanilla python classes) as well.

Annotations

There are three types of annotations to be used, and they have a common base class.

VectorStoreRecordField

This is the base class for all annotations, it is not meant to be used directly.

VectorStoreRecordField parameters

Parameter Required Description
name No Can be added directly but will be set during parsing of the model.
property_type No Should be a string, will also be derived during parsing.

Tip

The annotations are parsed by the vectorstoremodel decorator and one of the things it does is to create a record definition for the class, it is therefore not necessary to instantiate a field class when no parameters are set, the field can be annotated with just the class, like this:

hotel_id: Annotated[str, VectorStoreRecordKeyField]

VectorStoreRecordKeyField

Use this annotation to indicate that this attribute is the key of the record.

VectorStoreRecordKeyField()

VectorStoreRecordKeyField parameters

No other parameters outside of the base class are defined.

VectorStoreRecordDataField

Use this annotation to indicate that your attribute contains general data that is not a key or a vector.

VectorStoreRecordDataField(is_filterable=True)

VectorStoreRecordDataField parameters

Parameter Required Description
has_embedding No Indicates whether the property has a embedding associated with it, default is None.
embedding_property_name No The name of the property that contains the embedding, default is None.
is_filterable No Indicates whether the property should be indexed for filtering in cases where a database requires opting in to indexing per property. Default is false.
is_full_text_searchable No Indicates whether the property should be indexed for full text search for databases that support full text search. Default is false.

VectorStoreRecordVectorField

Use this annotation to indicate that your attribute contains a vector.

VectorStoreRecordVectorField(dimensions=4, distance_function=DistanceFunction.COSINE, index_kind=IndexKind.HNSW)

VectorStoreRecordVectorField parameters

Parameter Required Description
dimensions Yes for collection create, optional otherwise The number of dimensions that the vector has. This is typically required when creating a vector index for a collection.
index_kind No The type of index to index the vector with. Default varies by vector store type.
distance_function No The type of distance function to use when doing vector comparison during vector search over this vector. Default varies by vector store type.
local_embedding No Indicates whether the property has a local embedding associated with it, default is None.
embedding_settings No The settings for the embedding, in the form of a dict with service_id as key and PromptExecutionSettings as value, default is None.
serialize_function No The function to use to serialize the vector, if the type is not a list[float | int] this function is needed, or the whole model needs to be serialized.
deserialize_function No The function to use to deserialize the vector, if the type is not a list[float | int] this function is needed, or the whole model needs to be deserialized.

Common index kinds and distance function types are supplied as static values on the semantic_kernel.data.IndexKind and semantic_kernel.data.DistanceFunction classes. Individual Vector Store implementations may also use their own index kinds and distance functions, where the database supports unusual types.

All methods to upsert or get records use strongly typed model classes. The fields on these classes are decorated with annotations that indicate the purpose of each field.

Tip

For an alternative to using attributes, refer to defining your schema with a record definition.

Here is an example of a model that is decorated with these annotations. By default, most out of the box vector stores use Jackson, thus is a good practice to ensure the model object can be serialized by Jackson, i.e the class is visible, has getters, constructor, annotations, etc.

import com.microsoft.semantickernel.data.vectorstorage.annotations.VectorStoreRecordData;
import com.microsoft.semantickernel.data.vectorstorage.annotations.VectorStoreRecordKey;
import com.microsoft.semantickernel.data.vectorstorage.annotations.VectorStoreRecordVector;
import com.microsoft.semantickernel.data.vectorstorage.definition.DistanceFunction;
import com.microsoft.semantickernel.data.vectorstorage.definition.IndexKind;

import java.util.List;

public class Hotel {
    @VectorStoreRecordKey
    private String hotelId;

    @VectorStoreRecordData(isFilterable = true)
    private String name;

    @VectorStoreRecordData(isFullTextSearchable = true)
    private String description;

    @VectorStoreRecordVector(dimensions = 4, indexKind = IndexKind.HNSW, distanceFunction = DistanceFunction.COSINE_DISTANCE)
    private List<Float> descriptionEmbedding;

    @VectorStoreRecordData(isFilterable = true)
    private List<String> tags;

    public Hotel() { }

    public Hotel(String hotelId, String name, String description, List<Float> descriptionEmbedding, List<String> tags) {
        this.hotelId = hotelId;
        this.name = name;
        this.description = description;
        this.descriptionEmbedding = descriptionEmbedding;
        this.tags = tags;
    }

    public String getHotelId() { return hotelId; }
    public String getName() { return name; }
    public String getDescription() { return description; }
    public List<Float> getDescriptionEmbedding() { return descriptionEmbedding; }
    public List<String> getTags() { return tags; }
}

Annotations

VectorStoreRecordKey

Use this annotation to indicate that your field is the key of the record.

@VectorStoreRecordKey
private String hotelId;

VectorStoreRecordKey parameters

Parameter Required Description
storageName No Can be used to supply an alternative name for the field in the database. Note that this parameter is not supported by all connectors, e.g. where Jackson is used, in that case the storage name can be specified using Jackson annotations.

Tip

For more information on which connectors support storageName and what alternatives are available, refer to the documentation for each connector.

VectorStoreRecordData

Use this annotation to indicate that your field contains general data that is not a key or a vector.

@VectorStoreRecordData(isFilterable = true)
private String name;

VectorStoreRecordData parameters

Parameter Required Description
isFilterable No Indicates whether the field should be indexed for filtering in cases where a database requires opting in to indexing per field. Default is false.
isFullTextSearchable No Indicates whether the field should be indexed for full text search for databases that support full text search. Default is false.
storageName No Can be used to supply an alternative name for the field in the database. Note that this parameter is not supported by all connectors, e.g. where Jackson is used, in that case the storage name can be specified using Jackson annotations.

Tip

For more information on which connectors support storageName and what alternatives are available, refer to the documentation for each connector.

VectorStoreRecordVector

Use this annotation to indicate that your field contains a vector.

@VectorStoreRecordVector(dimensions = 4, indexKind = IndexKind.HNSW, distanceFunction = DistanceFunction.COSINE_DISTANCE)
private List<Float> descriptionEmbedding;

VectorStoreRecordVector parameters

Parameter Required Description
dimensions Yes for collection create, optional otherwise The number of dimensions that the vector has. This is typically required when creating a vector index for a collection.
indexKind No The type of index to index the vector with. Default varies by vector store type.
distanceFunction No The type of distance function to use when doing vector comparison during vector search over this vector. Default varies by vector store type.
storageName No Can be used to supply an alternative name for the field in the database. Note that this parameter is not supported by all connectors, e.g. where Jackson is used, in that case the storage name can be specified using Jackson annotations.

Common index kinds and distance function types are supplied on the com.microsoft.semantickernel.data.vectorstorage.definition.IndexKind and com.microsoft.semantickernel.data.vectorstorage.definition.DistanceFunction enums. Individual Vector Store implementations may also use their own index kinds and distance functions, where the database supports unusual types.

Tip

For more information on which connectors support storageName and what alternatives are available, refer to the documentation for each connector.

More info coming soon.