Edit

Share via


Define your data model

Microsoft.Extensions.VectorData uses a model-first approach to interacting with databases.

All methods to upsert or get records use strongly typed model classes. There are two ways to define the data model:

  • By decorating properties on the model classes with attributes that indicate the purpose of each property.
  • By defining your storage schema using a record definition that you supply separately from the data model. The record definition is a VectorStoreCollectionDefinition that contains properties.

Here's an example of a class, or data model, whose properties are decorated with VectorStore*Attribute attributes.

public class Hotel
{
    [VectorStoreKey]
    public ulong HotelId { get; set; }

    [VectorStoreData(IsIndexed = true)]
    public required string HotelName { get; set; }

    [VectorStoreData(IsFullTextIndexed = true)]
    public required string Description { get; set; }

    [VectorStoreVector(Dimensions: 4, DistanceFunction = DistanceFunction.CosineSimilarity, IndexKind = IndexKind.Hnsw)]
    public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }

    [VectorStoreData(IsIndexed = true)]
    public required string[] Tags { get; set; }
}

Data model property attributes

The VectorStore*Attribute attributes that define data models for vector databases are:

VectorStoreKeyAttribute

Use the VectorStoreKeyAttribute attribute to indicate that your property is the primary key of the record.

[VectorStoreKey]
public ulong HotelId { get; set; }

The following table shows the parameters for VectorStoreKeyAttribute.

Parameter Required Description
StorageName No Can be used to supply an alternative name for the property in the database. This parameter isn't supported by all providers, for example, where alternatives like JsonPropertyNameAttribute are supported.

VectorStoreDataAttribute

Use the VectorStoreDataAttribute attribute to indicate that your property contains general data that is not a key or a vector.

[VectorStoreData(IsIndexed = true)]
public required string HotelName { get; set; }

The following table shows the parameters for VectorStoreDataAttribute.

Parameter Required Description
IsIndexed No Indicates whether the property should be indexed for filtering in cases where a database requires opting in to indexing per property. The default is false.
IsFullTextIndexed No Indicates whether the property should be indexed for full text search for databases that support full text search. The default is false.
StorageName No Can be used to supply an alternative name for the property in the database. This parameter is not supported by all providers, for example, where alternatives like JsonPropertyNameAttribute are supported.

VectorStoreVectorAttribute

Use the VectorStoreVectorAttribute attribute to indicate that your property contains a vector.

[VectorStoreVector(Dimensions: 4, DistanceFunction = DistanceFunction.CosineSimilarity, IndexKind = IndexKind.Hnsw)]
public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }

It's also possible to use VectorStoreVectorAttribute on properties that don't have a vector type, for example, a property of type string. When a property is decorated in this way, you need to provide an IEmbeddingGenerator instance to the vector store. When upserting the record, the text that's in the string property is automatically converted and stored as a vector in the database. (It's not possible to retrieve a vector using this mechanism.)

[VectorStoreVector(Dimensions: 4, DistanceFunction = DistanceFunction.CosineSimilarity, IndexKind = IndexKind.Hnsw)]
public string DescriptionEmbedding { get; set; }

Tip

For more information on how to use built-in embedding generation, see Let the vector store generate embeddings.

The following table shows the parameters for VectorStoreVectorAttribute.

Parameter Required Description
Dimensions Yes The number of dimensions that the vector has. This is required when creating a vector index for a collection.
IndexKind No The type of index to index the vector with. Default varies by vector store type.
DistanceFunction No The type of function to use when doing vector comparison during vector search over this vector. Default varies by vector store type.
StorageName No Can be used to supply an alternative name for the property in the database. This parameter is not supported by all providers, for example, where alternatives like JsonPropertyNameAttribute is supported.

Common index kinds and distance function types are supplied as static values on the IndexKind and DistanceFunction classes. Individual vector store implementations might also use their own index kinds and distance functions, where the database supports unusual types.

Record definition properties

Use the VectorStore*Property classes to create a record definition that you pass to the data model:

VectorStoreKeyProperty

Use the VectorStoreKeyProperty class to indicate that your property is the key of the record.

new VectorStoreKeyProperty("HotelId", typeof(ulong)),

The following table shows the configuration settings for VectorStoreKeyProperty.

Parameter Required Description
Name Yes The name of the property on the data model. Used by the mapper to automatically map between the storage schema and data model and for creating indexes.
Type No The type of the property on the data model. Used by the mapper to automatically map between the storage schema and data model and for creating indexes.
StorageName No Can be used to supply an alternative name for the property in the database. This parameter is not supported by all providers, for example, where alternatives like JsonPropertyNameAttribute are supported.

VectorStoreDataProperty

Use the VectorStoreDataProperty class to indicate that your property contains general data that isn't a key or a vector.

new VectorStoreDataProperty("HotelName", typeof(string)) { IsIndexed = true },
new VectorStoreDataProperty("Description", typeof(string)) { IsFullTextIndexed = true },

The following table shows the configuration settings for VectorStoreDataProperty.

Parameter Required Description
Name Yes The name of the property on the data model. Used by the mapper to automatically map between the storage schema and data model and for creating indexes.
Type No The type of the property on the data model. Used by the mapper to automatically map between the storage schema and data model and for creating indexes.
IsIndexed No Indicates whether the property should be indexed for filtering in cases where a database requires opting in to indexing per property. Default is false.
IsFullTextIndexed No Indicates whether the property should be indexed for full text search for databases that support full text search. Default is false.
StorageName No Can be used to supply an alternative name for the property in the database. This parameter is not supported by all providers, for example, where alternatives like JsonPropertyNameAttribute is supported.

VectorStoreVectorProperty

Use the VectorStoreVectorProperty class to indicate that your property contains a vector.

new VectorStoreVectorProperty("DescriptionEmbedding", typeof(float), dimensions: 4)

The following table shows the configuration settings for VectorStoreVectorProperty.

Parameter Required Description
Name Yes The name of the property on the data model. Used by the mapper to automatically map between the storage schema and data model and for creating indexes.
Type No The type of the property on the data model. Used by the mapper to automatically map between the storage schema and data model and for creating indexes.
Dimensions Yes The number of dimensions that the vector has. This is required for creating a vector index for a collection.
IndexKind No The type of index to index the vector with. Default varies by vector store type.
DistanceFunction No The type of function to use when doing vector comparison during vector search over this vector. Default varies by vector store type.
StorageName No Can be used to supply an alternative name for the property in the database. This parameter is not supported by all providers, for example, where alternatives like JsonPropertyNameAttribute is supported.
EmbeddingGenerator No Allows specifying a Microsoft.Extensions.AI.IEmbeddingGenerator instance to use for generating embeddings automatically for the decorated property.

Use vector store abstractions without defining a data model

There are cases where it isn't desirable or possible to define your own data model. For example, imagine that you don't know at compile time what your database schema looks like, and the schema is only provided via configuration. Creating a data model that reflects the schema would be impossible in this case. Instead, you can map dynamically by using a Dictionary<string, object?> for the record type. Properties are added to the Dictionary with key as the property name and the value as the property value.

Note

Most apps will simply use strongly typed .NET types to model their data. Dynamic mapping via Dictionary<string, object?> is for advanced, arbitrary data-mapping scenarios.

Supply schema information when using Dictionary

When you use a Dictionary, providers still need to know what the database schema looks like. Without the schema information, the provider would not be able to create a collection or know how to map to and from the storage representation that each database uses.

You can use a record definition to provide the schema information. Unlike a data model, a record definition can be created from configuration at runtime when schema information isn't known at compile time.

Example

To use Dictionary with a provider, specify it as your data model when you create the collection. Also provide a record definition.

VectorStoreCollectionDefinition definition = new()
{
    Properties =
    [
        new VectorStoreKeyProperty("Key", typeof(string)),
        new VectorStoreDataProperty("Term", typeof(string)),
        new VectorStoreDataProperty("Definition", typeof(string)),
        new VectorStoreVectorProperty("DefinitionEmbedding", typeof(ReadOnlyMemory<float>), dimensions: 1536)
    ]
};

// Use GetDynamicCollection instead of the regular GetCollection method
// to get an instance of a collection using Dictionary<string, object?>.
VectorStoreCollection<object, Dictionary<string, object?>> dynamicDataModelCollection =
    vectorStore.GetDynamicCollection("glossary", definition);

// Since schema information is available from the record definition,
// it's possible to create a collection with the right vectors,
// dimensions, indexes, and distance functions.
await dynamicDataModelCollection.EnsureCollectionExistsAsync();

// When retrieving a record from the collection,
// access key, data, and vector values via the dictionary entries.
Dictionary<string, object?>? record = await dynamicDataModelCollection.GetAsync("SK");
Console.WriteLine(record["Definition"]);