Share via

Vector Database, Cosmos DB Container. Vector and Indexing policies.

Anders Blomkvist 25 Reputation points
Jun 7, 2024, 3:21 PM

Hello guys,

I'm going nuts on this one.

I'm trying to build an Azure Cosmos DB Container with the Vector feature, I have enabled the feature of Vector Search for NoSQL API (preview).

I first tried to create the container in Azure portal and then changing the settings with no luck.
So I went with a Custom Deployment with my own template code..


{

"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",

"contentVersion": "1.0.0.0",

"resources": [

{

  "type": "Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers",

  "apiVersion": "2021-06-15",

  "name": "[concat(parameters('cosmosDbAccountName'), '/databases/', parameters('databaseName'), '/containers/', parameters('containerName'))]",

  "properties": {

    "resource": {

      "id": "[parameters('containerName')]",

      "partitionKey": {

        "paths": ["/dataSource"],

        "kind": "Hash"

      },

      "indexingPolicy": {

        "indexingMode": "consistent",

        "automatic": true,

        "includedPaths": [

          {

            "path": "/*"

          },

          {

            "path": "/dataSource/?"

          },

          {

            "path": "/trademarkName/?"

          },

          {

            "path": "/trademarkStatus/?"

          },

          {

            "path": "/goodsAndServices/*"

          },

          {

            "path": "/textEmbedding/?"

          },

          {

            "path": "/imageEmbedding/?"

          },

          {

            "path": "/id/?"

          }

        ],

        "excludedPaths": [

          {

            "path": "/_etag/?"

          }

        ],

        "vectorIndexes": [

          {

            "path": "/textEmbedding",

            "type": "quantizedFlat"

          },

          {

            "path": "/imageEmbedding",

            "type": "quantizedFlat"

          }

        ]

      },

      "vectorPolicy": {

        "vectorEmbeddings": [

          {

            "path": "/imageEmbedding",

            "dataType": "float32",

            "distanceFunction": "cosine",

            "dimensions": 768

          },

          {

            "path": "/textEmbedding",

            "dataType": "float32",

            "distanceFunction": "cosine",

            "dimensions": 768

          }

        ]

      }

    }

  }

}

],

"parameters": {

"cosmosDbAccountName": {

  "type": "string"

},

"databaseName": {

  "type": "string"

},

"containerName": {

  "type": "string"

}

}

}

This gave me the error that the /textEmbedding.
Vector Indexing Policy's path::\\/textEmbedding not matching in Embedding's path.

So I went, okay maybe its bug.

So I went on creating the stuff in C# with this code:


    // Define the connection string and create a new CosmosClient instance

    var connectionString = "Your Connection String Here";

    CosmosClient client = new(connectionString, new CosmosClientOptions

    {

        SerializerOptions = new CosmosSerializationOptions

        {

            IgnoreNullValues = false,

            PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase,

        },

        AllowBulkExecution = true,

        ApplicationRegion = Regions.NorthEurope

    });

    // Create the database if it does not exist

    Database db = client.GetDatabase("cosmosomatic");

    // Define the list of embeddings to be used in the container

    List<Embedding> embeddings = new List<Embedding>()

    {

        new Embedding()

        {

            Path = "/imageEmbedding",

            DataType = VectorDataType.Float32,

            DistanceFunction = DistanceFunction.Cosine,

            Dimensions = 768,

        },

        new Embedding()

        {

            Path = "/textEmbedding",

            DataType = VectorDataType.Float32,

            DistanceFunction = DistanceFunction.Cosine,

            Dimensions = 768,

        }

    };

    // Convert list to a Collection type

    Collection<Embedding> embeddingCollection = new(embeddings);

    // Define the container properties including the vector embedding policy

    ContainerProperties properties = new ContainerProperties(id: "vector-container", partitionKeyPath: "/dataSource")

    {

        VectorEmbeddingPolicy = new(embeddingCollection),

        IndexingPolicy = new IndexingPolicy()

        {

            VectorIndexes = new()

            {

                new VectorIndexPath()

                {

                    Path = "/vector",

                    Type = VectorIndexType.QuantizedFlat,

                }

            }

        },

    };

    // Set manual throughput options

    var throughput = ThroughputProperties.CreateManualThroughput(400);

    // Create the container with the specified properties and throughput

    Container container = await db.CreateContainerIfNotExistsAsync(properties, throughput);

    System.Console.WriteLine("Container with vector indexing created successfully.");  

But then the internal Embedding class have this:
/// <summary>

**/// Represents the embedding settings for the vector index.**

**/// </summary>**

#if PREVIEW

**public**

#else

**internal**

#endif

**class Embedding : IEquatable<Embedding>**  

And they haven't any preview versions in their nuget packages..

So hopefully there is someone out there who can see what I'm doing so badly wrong here.

Thanks!

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,721 questions
{count} votes

Accepted answer
  1. Oury Ba-MSFT 19,991 Reputation points Microsoft Employee
    Jun 12, 2024, 8:03 PM

    @Anders Blomkvist

    The vector embedding policy property name is “vectorEmbeddingPolicy”, not “vectorPolicy”.

     Also, the vector paths shouldn’t be listed explicitly in the “includedPaths” property, however, this shouldn’t be a blocker.Regards,

    Oury


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.