แก้ไข

แชร์ผ่าน


Generate embeddings for vector store providers

Vector store providers support multiple ways of generating embeddings. You can generate them yourself and pass them as part of a record when using a VectorStoreCollection<TKey,TRecord>. Or they can be generated internally to the VectorStoreCollection<TKey,TRecord>.

Generate embeddings yourself

The most direct approach is to generate embeddings before calling UpsertAsync or SearchAsync, and pass them along with your records or search query.

Construct an embedding generator

For information on how to construct Microsoft.Extensions.AI embedding generators, see Embeddings in .NET.

Generate embeddings on upsert with IEmbeddingGenerator

async Task GenerateEmbeddingsAndUpsertAsync(
    IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator,
    VectorStoreCollection<ulong, Hotel> collection)
{
    // Upsert a record.
    string descriptionText = "A place where everyone can be happy.";
    ulong hotelId = 1;

    // Generate the embedding.
    ReadOnlyMemory<float> embedding =
        (await embeddingGenerator.GenerateAsync(descriptionText)).Vector;

    // Create a record and upsert with the already generated embedding.
    await collection.UpsertAsync(new Hotel
    {
        HotelId = hotelId,
        HotelName = "Hotel Happy",
        Description = descriptionText,
        DescriptionEmbedding = embedding,
        Tags = ["luxury", "pool"]
    });
}

Generate embeddings on search with IEmbeddingGenerator

async Task GenerateEmbeddingsAndSearchAsync(
    IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator,
    VectorStoreCollection<ulong, Hotel> collection)
{
    // Upsert a record.
    string descriptionText = "Find me a hotel with happiness in mind.";

    // Generate the embedding.
    ReadOnlyMemory<float> searchEmbedding =
        (await embeddingGenerator.GenerateAsync(descriptionText)).Vector;

    // Search using the already generated embedding.
    IAsyncEnumerable<VectorSearchResult<Hotel>> searchResult = collection.SearchAsync(searchEmbedding, top: 1);
    List<VectorSearchResult<Hotel>> resultItems = await searchResult.ToListAsync();

    // Print the first search result.
    Console.WriteLine("Score for first result: " + resultItems.FirstOrDefault()?.Score);
    Console.WriteLine("Hotel description for first result: " + resultItems.FirstOrDefault()?.Record.Description);
}

Let the vector store generate embeddings

You can configure an embedding generator on your vector store, which allows embeddings to be automatically generated during both upsert and search operations. This approach eliminates the need for manual preprocessing.

To enable generating vectors automatically on upsert, the vector property on your data model is defined as the source type, for example, string, but is still decorated with a VectorStoreVectorAttribute.

[VectorStoreVector(1536)]
public required string Embedding { get; set; }

Before upsert, the Embedding property should contain the string from which a vector should be generated. The type of the vector stored in the database (for example, float32 or float16) will be derived from the configured embedding generator.

Important

These vector properties do not support retrieving either the generated vector or the original text that the vector was generated from. They also do not store the original text. If the original text needs to be stored, add a separate data property to store it.

Embedding generators that implement the Microsoft.Extensions.AI abstractions are supported and can be configured at various levels:

  • On the vector store:

    You can set a default embedding generator for the entire vector store. This generator will be used for all collections and properties unless overridden.

    VectorStore vectorStore = new QdrantVectorStore(
        new QdrantClient("localhost"),
        ownsClient: true,
        new QdrantVectorStoreOptions
        {
            EmbeddingGenerator = embeddingGenerator
        });
    
  • On a collection:

    You can configure an embedding generator for a specific collection, overriding the store-level generator.

    var collectionOptions = new QdrantCollectionOptions
    {
        EmbeddingGenerator = embeddingGenerator
    };
    
    var collection = new QdrantCollection<ulong, MyRecord>(
        new QdrantClient("localhost"),
        "myCollection",
        ownsClient: true,
        collectionOptions);
    
  • On a record definition:

    When defining properties programmatically using VectorStoreCollectionDefinition, you can specify an embedding generator for all properties.

    var definition = new VectorStoreCollectionDefinition
    {
        EmbeddingGenerator = embeddingGenerator,
        Properties =
        [
            new VectorStoreKeyProperty("Key", typeof(ulong)),
            new VectorStoreVectorProperty("DescriptionEmbedding", typeof(string), dimensions: 1536)
        ]
    };
    
    collectionOptions = new QdrantCollectionOptions
    {
        Definition = definition
    };
    
    collection = new QdrantCollection<ulong, MyRecord>(
        new QdrantClient("localhost"),
        "myCollection",
        ownsClient: true,
        collectionOptions);
    
  • On a vector property definition:

    When defining properties programmatically, you can set an embedding generator directly on the property.

    VectorStoreVectorProperty vectorProperty = new(
        "DescriptionEmbedding",
        typeof(string),
        dimensions: 1536)
    {
        EmbeddingGenerator = embeddingGenerator
    };
    

Example usage

The following example demonstrates how to use the embedding generator to automatically generate vectors during both upsert and search operations. This approach simplifies workflows by eliminating the need to precompute embeddings manually.


// The data model.
internal class FinanceInfo
{
    [VectorStoreKey]
    public string Key { get; set; } = string.Empty;

    [VectorStoreData]
    public string Text { get; set; } = string.Empty;

    // Note that the vector property is typed as a string, and
    // its value is derived from the Text property. The string
    // value will however be converted to a vector on upsert and
    // stored in the database as a vector.
    [VectorStoreVector(1536)]
    public string Embedding => Text;
}

public static async Task RunAsync()
{
    // Create an OpenAI embedding generator.
    var embeddingGenerator = new OpenAIClient("your key")
        .GetEmbeddingClient("your chosen model")
        .AsIEmbeddingGenerator();

    // Use the embedding generator with the vector store.
    VectorStore vectorStore = new InMemoryVectorStore(new()
        { EmbeddingGenerator = embeddingGenerator }
        );
    InMemoryCollection<string, FinanceInfo> collection =
        (InMemoryCollection<string, FinanceInfo>)vectorStore.GetCollection<string, FinanceInfo>("finances");
    await collection.EnsureCollectionExistsAsync();

    // Create some test data.
    string[] budgetInfo =
    [
        "The budget for 2020 is EUR 100 000",
        "The budget for 2021 is EUR 120 000",
        "The budget for 2022 is EUR 150 000",
        "The budget for 2023 is EUR 200 000",
        "The budget for 2024 is EUR 364 000"
    ];

    // Embeddings are generated automatically on upsert.
    IEnumerable<FinanceInfo> records = budgetInfo.Select(
        (input, index) => new FinanceInfo { Key = index.ToString(), Text = input }
        );
    await collection.UpsertAsync(records);

    // Embeddings for the search is automatically generated on search.
    IAsyncEnumerable<VectorSearchResult<FinanceInfo>> searchResult =
        collection.SearchAsync("What is my budget for 2024?", top: 1);

    // Output the matching result.
    await foreach (VectorSearchResult<FinanceInfo> result in searchResult)
    {
        Console.WriteLine($"Key: {result.Record.Key}, Text: {result.Record.Text}");
    }
}

Embedding dimensions

Vector databases typically require you to specify the number of dimensions that each vector has when creating the collection. Different embedding models typically support generating vectors with various dimension sizes. For example, OpenAI text-embedding-ada-002 generates vectors with 1536 dimensions. Some models also allow you to choose the number of dimensions you want in the output vector. For example, Google text-embedding-004 produces vectors with 768 dimensions by default, but allows you to choose any number of dimensions between 1 and 768.

It's important to ensure that the vectors generated by the embedding model have the same number of dimensions as the matching vector in the database.

If you create a collection using the vector store abstractions, you need to specify the number of dimensions required for each vector property either via annotations or via the record definition. The following code shows examples of setting the number of dimensions to 1536 using both mechanisms.

[VectorStoreVector(Dimensions: 1536)]
public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }

new VectorStoreVectorProperty(
    "DescriptionEmbedding",
    typeof(float),
    dimensions: 1536);

See also