Edit

Quickstart: Vector index with .NET in Azure DocumentDB

This article explains how to compare all three vector search algorithms (DiskANN, HNSW, and IVF) in Azure DocumentDB using the .NET client library. The sample demonstrates how each algorithm performs with different similarity functions (COS, L2, IP) and helps you choose the right configuration for your workload. This quickstart uses a sample hotel dataset in a JSON file with precalculated vectors from the text-embedding-3-small model.

Find the sample code on GitHub.

Prerequisites

  • An Azure subscription. If you don't have an Azure subscription, create a free account.

Create a .NET project

  1. Create a new directory for your project and initialize the .NET console application:

    mkdir select-algorithm-dotnet
    cd select-algorithm-dotnet
    dotnet new console --framework net8.0 --name SelectAlgorithm --output .
    

Verify the project was created:

ls SelectAlgorithm.csproj
  1. Install the required NuGet packages:

    dotnet add package Azure.AI.OpenAI --version 2.1.0
    dotnet add package Azure.Identity --version 1.13.2
    dotnet add package MongoDB.Driver --version 3.2.0
    dotnet add package Microsoft.Extensions.Configuration --version 8.0.0
    dotnet add package Microsoft.Extensions.Configuration.Binder --version 8.0.2
    dotnet add package Microsoft.Extensions.Configuration.EnvironmentVariables --version 8.0.0
    dotnet add package Microsoft.Extensions.Configuration.Json --version 8.0.1
    

    These packages provide:

    • Azure.AI.OpenAI: Azure OpenAI client library to create vector embeddings.
    • Azure.Identity: Azure Identity library for passwordless authentication with DefaultAzureCredential.
    • MongoDB.Driver: MongoDB driver for .NET to interact with DocumentDB.
    • Microsoft.Extensions.Configuration*: Configuration and environment variable binding infrastructure.

    Verify installed packages:

    dotnet list package
    

Create data file with vectors

  1. Create a new data directory for the hotels data file:

    mkdir data
    

  1. Download the Hotels_Vector.json raw data file with vectors to your data directory:

    curl -o data/Hotels_Vector.json https://raw.githubusercontent.com/Azure-Samples/documentdb-samples/refs/heads/main/ai/data/Hotels_Vector.json
    

Verify the file downloaded successfully:

ls data/Hotels_Vector.json

You should see Hotels_Vector.json in the data directory.

Configure appsettings.json and environment variable overrides

Note

.NET uses the standard IConfiguration system with appsettings.json as the primary configuration source. Environment variables can override any setting using double-underscore (__) as the hierarchy separator. The other language quickstarts use flat environment variables (DOCUMENTDB_CLUSTER_NAME), but .NET's hierarchical configuration is the idiomatic pattern for this platform.

  1. Create an appsettings.json configuration file:

    touch appsettings.json
    

  1. Add this content to appsettings.json:

    {
      "DocumentDB": {
        "DatabaseName": "Hotels",
        "ClusterName": "<your-cluster-name>",
        "LoadBatchSize": 100
      },
      "VectorSearch": {
        "Similarity": "",
        "TopK": 5,
        "Query": "luxury hotel near the beach"
      },
      "AzureOpenAI": {
        "Endpoint": "https://<your-resource>.openai.azure.com/",
        "EmbeddingModel": "text-embedding-3-small"
      },
      "DataFiles": {
        "WithVectors": "data/Hotels_Vector.json"
      },
      "Embedding": {
        "EmbeddedField": "DescriptionVector",
        "Dimensions": 1536
      }
    }
    
  2. Set any environment variable overrides in your current shell session. The sample uses DefaultAzureCredential for passwordless authentication, and .NET maps environment variables to appsettings.json keys with the Section__Key format:

    export AzureOpenAI__Endpoint="https://<your-resource>.openai.azure.com/"
    export AzureOpenAI__EmbeddingModel="text-embedding-3-small"
    export DocumentDB__ClusterName="<your-cluster-name>"
    export DocumentDB__DatabaseName="Hotels"
    export DataFiles__WithVectors="data/Hotels_Vector.json"
    export Embedding__EmbeddedField="DescriptionVector"
    export Embedding__Dimensions="1536"
    export AZURE_TENANT_ID="<your-tenant-id>"
    

Replace the placeholder values with your own information:

  • <your-resource>: Your Azure OpenAI resource name
  • <your-cluster-name>: Your Azure DocumentDB cluster name
  • <your-tenant-id>: Your Microsoft Entra tenant ID

These environment variables override the matching values in appsettings.json. For example, DocumentDB__ClusterName overrides DocumentDB:ClusterName, DocumentDB__DatabaseName overrides DocumentDB:DatabaseName, and AzureOpenAI__Endpoint overrides AzureOpenAI:Endpoint.

Prefer passwordless authentication. For more information on setting up managed identity and the full range of your authentication options, see Authenticate .NET apps to Azure services by using the Azure SDK for .NET.

Create code files

Continue the project by creating code files for vector search comparison. When you're done, the project structure should look like this:

select-algorithm-dotnet/
├── data/
│   └── Hotels_Vector.json
├── Models/
│   ├── Configuration.cs
│   └── HotelData.cs
├── Utilities/
│   └── AzureIdentityTokenHandler.cs
├── appsettings.json
├── CompareAll.cs
├── Program.cs
├── SelectAlgorithm.csproj
└── Utils.cs
  1. Create the directory structure:

    mkdir Models
    mkdir Utilities
    

  1. Create the code files:

    touch CompareAll.cs
    touch Utils.cs
    touch Models/Configuration.cs
    touch Models/HotelData.cs
    touch Utilities/AzureIdentityTokenHandler.cs
    

Create the algorithm comparison code

Create the following source files to implement the vector search comparison.

Program.cs

Replace the contents of Program.cs with this code:

using Microsoft.Extensions.Configuration;
using SelectAlgorithm.Models;

namespace SelectAlgorithm;

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine();
        Console.WriteLine("Select Algorithm Demo - Azure DocumentDB Vector Search (.NET)");
        Console.WriteLine(new string('-', 60));
        Console.WriteLine();

        var configuration = new ConfigurationBuilder()
            .SetBasePath(Directory.GetCurrentDirectory())
            .AddJsonFile("appsettings.json", optional: false, reloadOnChange: true)
            .AddEnvironmentVariables()
            .Build();

        var appConfig = new AppConfiguration();
        configuration.Bind(appConfig);

        var command = args.Length > 0 ? args[0].ToLower() : "compare-all";

        switch (command)
        {
            case "compare-all":
                CompareAll.Run(appConfig);
                break;
            default:
                Console.WriteLine($"Unknown command: {command}");
                Console.WriteLine("Usage: dotnet run -- compare-all");
                return;
        }

        Console.WriteLine();
        Console.WriteLine("Done!");
    }
}

This main entry point:

  • Loads configuration from appsettings.json and environment variables.
  • Sets up dependency injection with logging infrastructure.
  • Initializes Azure OpenAI and DocumentDB clients using passwordless authentication.
  • Calls CompareAll.Run() to execute the flat project entry point.
  • Runs the comparison and prints results in a table format.

CompareAll.cs

Add this code to CompareAll.cs:

/// Unified comparison runner for all 9 combinations (3 algorithms × 3 similarity metrics).
/// Executes vector searches sequentially for fair timing and prints a formatted comparison table.

namespace SelectAlgorithm;

using MongoDB.Driver;
using MongoDB.Bson;
using OpenAI.Embeddings;
using SelectAlgorithm.Models;

public static class CompareAll
{
    private record IndexConfig(string Name, string Kind, string Similarity, BsonDocument ExtraParams);

    private record SearchResult(string Algorithm, string Metric, string Top1Name, double Top1Score, string Top2Name, double Top2Score);

    private static string GetAlgoDisplay(string kind) => kind switch
    {
        "vector-ivf" => "IVF",
        "vector-hnsw" => "HNSW",
        "vector-diskann" => "DiskANN",
        _ => kind
    };

    public static void Run(AppConfiguration appConfig)
    {
        Console.WriteLine(new string('=', 60));
        Console.WriteLine("  Compare All Algorithms × Metrics");
        Console.WriteLine("  9 combinations: IVF, HNSW, DiskANN × COS, L2, IP");
        Console.WriteLine(new string('=', 60));

        // Use config values with env var overrides for compare-specific settings
        var databaseName = appConfig.DocumentDB.DatabaseName;
        var dataFile = appConfig.DataFiles.WithVectors;
        var vectorField = appConfig.Embedding.EmbeddedField;
        var dimensions = appConfig.Embedding.Dimensions;
        var batchSize = appConfig.DocumentDB.LoadBatchSize;
        var queryText = Environment.GetEnvironmentVariable("QUERY_TEXT") ?? "luxury hotel near the beach";
        var topK = int.Parse(Environment.GetEnvironmentVariable("TOP_K") ?? "5");

        var mongoClient = Utils.GetMongoClientPasswordless(appConfig);
        var embeddingClient = Utils.GetEmbeddingClient(appConfig);

        try
        {
            var database = mongoClient.GetDatabase(databaseName);

            // Drop collection for a clean comparison
            database.DropCollection("hotels");
            Console.WriteLine("Dropped existing 'hotels' collection (if any)");

            var collection = database.GetCollection<BsonDocument>("hotels");

            // Load data once into single collection
            var data = Utils.ReadJsonFile(dataFile);
            var documents = data.Where(d => d.Contains(vectorField)).ToList();
            Console.WriteLine($"\nLoaded {documents.Count} documents with embeddings");
            Utils.InsertData(collection, documents, batchSize);

            // Generate ONE embedding for the query (reused for all 9 searches)
            Console.WriteLine($"\nQuery: \"{queryText}\"");
            Console.WriteLine($"Top K: {topK}");
            var embeddingResult = embeddingClient.GenerateEmbedding(queryText);
            var queryVector = embeddingResult.Value.ToFloats().ToArray();
            Console.WriteLine("Embedding generated (reused for all searches)\n");

            // Define 9 index configurations
            var configs = BuildIndexConfigs();

            // Run each config sequentially: drop→create→wait→search
            // DocumentDB doesn't allow multiple vector indexes of the same kind on the same field
            Console.WriteLine("Running 9 algorithm × metric combinations...\n");
            var results = new List<SearchResult>();
            foreach (var config in configs)
            {
                // 1. Drop all existing vector indexes
                DropVectorIndexes(collection, vectorField);

                // 2. Create this specific index
                CreateIndex(collection, vectorField, dimensions, config);
                Console.WriteLine($"  ✓ {config.Name} created");

                // 3. Search with retries while the index becomes available
                var searchResults = RunVectorSearchWithRetry(collection, queryVector, vectorField, config.Name, topK);
                if (searchResults.Count == 0)
                {
                    results.Add(new SearchResult(GetAlgoDisplay(config.Kind), config.Similarity, "(failed)", 0.0, "(failed)", 0.0));
                    continue;
                }

                // 4. Extract top 2 results and record
                var algoDisplay = GetAlgoDisplay(config.Kind);
                var top1Name = "-"; var top1Score = 0.0;
                var top2Name = "-"; var top2Score = 0.0;
                if (searchResults.Count > 0)
                {
                    var doc1 = searchResults[0];
                    top1Name = doc1.Contains("HotelName") ? doc1["HotelName"].AsString : "Unknown";
                    top1Score = doc1.Contains("score") ? doc1["score"].ToDouble() : 0.0;
                }
                if (searchResults.Count > 1)
                {
                    var doc2 = searchResults[1];
                    top2Name = doc2.Contains("HotelName") ? doc2["HotelName"].AsString : "Unknown";
                    top2Score = doc2.Contains("score") ? doc2["score"].ToDouble() : 0.0;
                }
                results.Add(new SearchResult(algoDisplay, config.Similarity, top1Name, top1Score, top2Name, top2Score));
            }

            var successCount = results.Count(r => r.Top1Name != "(failed)");

            // Print comparison table
            PrintComparisonTable(results);

            if (successCount == 0)
            {
                Console.WriteLine("\n❌ All 9 comparisons failed — no algorithm returned results.");
                Environment.ExitCode = 1;
            }
            else
            {
                Console.WriteLine($"\nSummary: {successCount} succeeded, {9 - successCount} failed");
            }
        }
        finally
        {
            // Cleanup: drop the comparison collection
            try
            {
                var database = mongoClient.GetDatabase(databaseName);
                database.DropCollection("hotels");
                Console.WriteLine("\nCleanup: dropped collection 'hotels'");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Cleanup warning: {ex.Message}");
            }
            mongoClient.Cluster.Dispose();
        }
    }

    private static List<IndexConfig> BuildIndexConfigs()
    {
        string[] metrics = ["COS", "L2", "IP"];
        var configs = new List<IndexConfig>();

        // IVF
        foreach (var metric in metrics)
            configs.Add(new IndexConfig($"vector_ivf_{metric.ToLower()}", "vector-ivf", metric, new BsonDocument { { "numLists", 1 } }));

        // HNSW
        foreach (var metric in metrics)
            configs.Add(new IndexConfig($"vector_hnsw_{metric.ToLower()}", "vector-hnsw", metric, new BsonDocument { { "m", 16 }, { "efConstruction", 64 } }));

        // DiskANN
        foreach (var metric in metrics)
            configs.Add(new IndexConfig($"vector_diskann_{metric.ToLower()}", "vector-diskann", metric, new BsonDocument { { "maxDegree", 32 }, { "lBuild", 50 } }));

        return configs;
    }

    private static void DropVectorIndexes(IMongoCollection<BsonDocument> collection, string vectorField)
    {
        try
        {
            using var cursor = collection.Indexes.List();
            foreach (var idx in cursor.ToList())
            {
                var name = idx.GetValue("name", "").AsString;
                var key = idx.GetValue("key", new BsonDocument()).AsBsonDocument;
                if (key.Contains(vectorField) && key[vectorField].AsString == "cosmosSearch")
                {
                    try { collection.Indexes.DropOne(name); } catch { }
                }
            }
        }
        catch { }
    }

    private static void CreateIndex(IMongoCollection<BsonDocument> collection, string vectorField, int dimensions, IndexConfig config)
    {
        // Drop existing index with same name if present
        try
        {
            collection.Indexes.DropOne(config.Name);
        }
        catch (MongoCommandException)
        {
            // Index doesn't exist, that's fine
        }

        var cosmosSearchOptions = new BsonDocument
        {
            { "kind", config.Kind },
            { "dimensions", dimensions },
            { "similarity", config.Similarity }
        };

        foreach (var param in config.ExtraParams)
        {
            cosmosSearchOptions.Add(param);
        }

        var command = new BsonDocument
        {
            { "createIndexes", collection.CollectionNamespace.CollectionName },
            { "indexes", new BsonArray
                {
                    new BsonDocument
                    {
                        { "name", config.Name },
                        { "key", new BsonDocument(vectorField, "cosmosSearch") },
                        { "cosmosSearchOptions", cosmosSearchOptions }
                    }
                }
            }
        };

        try
        {
            collection.Database.RunCommand<BsonDocument>(command);
        }
        catch (MongoCommandException ex) when (ex.Message.Contains("already exists"))
        {
            // Index already exists with same config — idempotent
        }
    }

    private static List<BsonDocument> RunVectorSearch(
        IMongoCollection<BsonDocument> collection,
        float[] queryVector,
        string vectorField,
        string indexName,
        int topK)
    {
        var pipeline = new[]
        {
            new BsonDocument("$search", new BsonDocument("cosmosSearch", new BsonDocument
            {
                { "vector", new BsonArray(queryVector.Select(f => (double)f)) },
                { "path", vectorField },
                { "k", topK }
            })),
            new BsonDocument("$project", new BsonDocument
            {
                { "HotelName", 1 },
                { "score", new BsonDocument("$meta", "searchScore") }
            })
        };

        return collection.Aggregate<BsonDocument>(pipeline).ToList();
    }

    private static List<BsonDocument> RunVectorSearchWithRetry(
        IMongoCollection<BsonDocument> collection,
        float[] queryVector,
        string vectorField,
        string indexName,
        int topK)
    {
        const int maxRetries = 5;
        const int retryDelayMs = 2000;

        for (var attempt = 0; attempt <= maxRetries; attempt++)
        {
            var results = RunVectorSearch(collection, queryVector, vectorField, indexName, topK);
            if (results.Count > 0)
            {
                return results;
            }

            if (attempt < maxRetries)
            {
                Console.WriteLine($"  No results for {indexName} yet. Retrying in 2 seconds ({attempt + 1}/{maxRetries})...");
                Thread.Sleep(retryDelayMs);
            }
        }

        Console.WriteLine($"  Search for {indexName} did not return results after {maxRetries} retries. Recording as failed.");
        return [];
    }

    private static void PrintComparisonTable(List<SearchResult> results)
    {
        Console.WriteLine();
        Console.WriteLine("┌──────────┬────────┬────────────────────────────┬────────┬────────────────────────────┬────────┬───────┐");
        Console.WriteLine($"│ {"Algorithm",-9}│ {"Metric",-7}│ {"Top 1 Result",-27}│ {"Score",-7}│ {"Top 2 Result",-27}│ {"Score",-7}│ {"Diff",-6}│");
        Console.WriteLine("├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤");

        for (var i = 0; i < results.Count; i++)
        {
            var r = results[i];
            var diff = Math.Abs(r.Top1Score - r.Top2Score);
            var top1Display = r.Top1Name.Length > 27 ? r.Top1Name[..24] + "..." : r.Top1Name;
            var top2Display = r.Top2Name.Length > 27 ? r.Top2Name[..24] + "..." : r.Top2Name;
            Console.WriteLine($"│ {r.Algorithm,-9}│ {r.Metric,-7}│ {top1Display,-27}│ {r.Top1Score,-7:F4}│ {top2Display,-27}│ {r.Top2Score,-7:F4}│ {diff,-6:F4}│");
            if (i < results.Count - 1)
                Console.WriteLine("├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤");
        }
        Console.WriteLine("└──────────┴────────┴────────────────────────────┴────────┴────────────────────────────┴────────┴───────┘");
    }
}

This service:

  • Manages the comparison workflow for all algorithms
  • Creates collections and indexes for each algorithm/similarity combination
  • Inserts data and executes vector searches
  • Measures and collects latency metrics
  • Configures algorithm-specific parameters for index creation and search

Supporting files

Create the following supporting files in the project:

Utils.cs

using MongoDB.Driver;
using MongoDB.Driver.Authentication.Oidc;
using MongoDB.Bson;
using MongoDB.Bson.Serialization;
using Azure.Identity;
using Azure.Core;
using Azure.AI.OpenAI;
using OpenAI.Embeddings;
using SelectAlgorithm.Models;

namespace SelectAlgorithm;

public class AzureOidcCallback : IOidcCallback
{
    private readonly DefaultAzureCredential _credential;
    private static readonly string[] Scopes = { "https://ossrdbms-aad.database.windows.net/.default" };

    public AzureOidcCallback(DefaultAzureCredential credential) => _credential = credential;

    public OidcAccessToken GetOidcAccessToken(OidcCallbackParameters parameters, CancellationToken cancellationToken)
    {
        var token = _credential.GetToken(new TokenRequestContext(Scopes), cancellationToken);
        return new OidcAccessToken(token.Token, token.ExpiresOn - DateTimeOffset.UtcNow);
    }

    public async Task<OidcAccessToken> GetOidcAccessTokenAsync(OidcCallbackParameters parameters, CancellationToken cancellationToken)
    {
        var token = await _credential.GetTokenAsync(new TokenRequestContext(Scopes), cancellationToken);
        return new OidcAccessToken(token.Token, token.ExpiresOn - DateTimeOffset.UtcNow);
    }
}

public static class Utils
{
    public static IMongoClient GetMongoClientPasswordless(AppConfiguration config)
    {
        var clusterName = config.DocumentDB.ClusterName;
        if (string.IsNullOrEmpty(clusterName))
            throw new InvalidOperationException("DocumentDB:ClusterName is required in appsettings.json");

        var credential = new DefaultAzureCredential();

        var connectionString = $"mongodb+srv://{clusterName}.global.mongocluster.cosmos.azure.com/";
        var settings = MongoClientSettings.FromConnectionString(connectionString);
        settings.ConnectTimeout = TimeSpan.FromSeconds(120);
        settings.UseTls = true;
        settings.RetryWrites = false;

        // Custom OIDC callback using DefaultAzureCredential
        // Chains through CLI, managed identity, etc.
        var oidcCallback = new AzureOidcCallback(credential);
        settings.Credential = MongoCredential.CreateOidcCredential(oidcCallback, null);

        return new MongoClient(settings);
    }

    public static EmbeddingClient GetEmbeddingClient(AppConfiguration config)
    {
        var endpoint = config.AzureOpenAI.Endpoint;
        if (string.IsNullOrEmpty(endpoint))
            throw new InvalidOperationException("AzureOpenAI:Endpoint is required in appsettings.json");

        var model = config.AzureOpenAI.EmbeddingModel;

        var credential = new DefaultAzureCredential();
        var azureClient = new AzureOpenAIClient(new Uri(endpoint), credential);
        return azureClient.GetEmbeddingClient(model);
    }

    public static List<BsonDocument> ReadJsonFile(string path)
    {
        if (!File.Exists(path))
            throw new FileNotFoundException($"Data file not found: {path}");

        var json = File.ReadAllText(path);
        return BsonSerializer.Deserialize<List<BsonDocument>>(json);
    }

    public static void InsertData(IMongoCollection<BsonDocument> collection, List<BsonDocument> data, int batchSize)
    {
        var totalDocuments = data.Count;
        var existingCount = collection.CountDocuments(new BsonDocument());

        if (existingCount >= totalDocuments)
        {
            Console.WriteLine($"Collection already has {existingCount} documents, skipping insert");
            return;
        }

        if (existingCount > 0)
        {
            collection.DeleteMany(new BsonDocument());
        }

        var insertedCount = 0;
        for (var i = 0; i < totalDocuments; i += batchSize)
        {
            var batch = data.Skip(i).Take(batchSize).ToList();
            try
            {
                collection.InsertMany(batch, new InsertManyOptions { IsOrdered = false });
                insertedCount += batch.Count;
            }
            catch (MongoBulkWriteException)
            {
                // Some documents may have been inserted before the error
                insertedCount += batch.Count;
            }
            Thread.Sleep(100);
        }

        Console.WriteLine($"Inserted {insertedCount}/{totalDocuments} documents");
    }

    public static void DropVectorIndexes(IMongoCollection<BsonDocument> collection, string vectorField)
    {
        try
        {
            using var cursor = collection.Indexes.List();
            var indexes = cursor.ToList();
            foreach (var index in indexes)
            {
                if (index.Contains("key"))
                {
                    var key = index["key"].AsBsonDocument;
                    if (key.Contains(vectorField) && key[vectorField].AsString == "cosmosSearch")
                    {
                        var indexName = index["name"].AsString;
                        collection.Indexes.DropOne(indexName);
                        Console.WriteLine($"Dropped existing vector index: {indexName}");
                    }
                }
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Warning: Error dropping indexes: {ex.Message}");
        }
    }

    public static List<BsonDocument> PerformVectorSearch(
        IMongoCollection<BsonDocument> collection,
        EmbeddingClient client,
        string query,
        string vectorField,
        string model,
        int topK = 5)
    {
        var embeddingResult = client.GenerateEmbedding(query);
        var queryVector = embeddingResult.Value.ToFloats().ToArray();

        var pipeline = new[]
        {
            new BsonDocument("$search", new BsonDocument("cosmosSearch", new BsonDocument
            {
                { "vector", new BsonArray(queryVector.Select(f => (double)f)) },
                { "path", vectorField },
                { "k", topK }
            })),
            new BsonDocument("$project", new BsonDocument
            {
                { "document", "$$ROOT" },
                { "score", new BsonDocument("$meta", "searchScore") }
            })
        };

        return collection.Aggregate<BsonDocument>(pipeline).ToList();
    }

    public static void PrintSearchResults(List<BsonDocument> results, string algorithm)
    {
        Console.WriteLine();
        Console.WriteLine(new string('=', 60));
        Console.WriteLine($"  {algorithm} Search Results ({results.Count} found)");
        Console.WriteLine(new string('=', 60));

        for (var i = 0; i < results.Count; i++)
        {
            var result = results[i];
            var doc = result.Contains("document") ? result["document"].AsBsonDocument : result;
            var name = doc.Contains("HotelName") ? doc["HotelName"].AsString
                     : doc.Contains("name") ? doc["name"].AsString
                     : "Unknown";
            var score = result.Contains("score") ? result["score"].ToDouble() : 0.0;
            Console.WriteLine($"  {i + 1}. {name} (score: {score:F4})");
        }

        Console.WriteLine();
    }
}

Utilities/AzureIdentityTokenHandler.cs

using Azure.Core;
using MongoDB.Driver.Authentication.Oidc;

namespace SelectAlgorithm.Utilities;

internal sealed class AzureIdentityTokenHandler(
    TokenCredential credential,
    string? tenantId
) : IOidcCallback
{
    private readonly string[] scopes = ["https://ossrdbms-aad.database.windows.net/.default"];

    public OidcAccessToken GetOidcAccessToken(OidcCallbackParameters parameters, CancellationToken cancellationToken)
    {
        AccessToken token = credential.GetToken(
            new TokenRequestContext(scopes, tenantId: tenantId),
            cancellationToken
        );

        return new OidcAccessToken(token.Token, token.ExpiresOn - DateTimeOffset.UtcNow);
    }

    public async Task<OidcAccessToken> GetOidcAccessTokenAsync(OidcCallbackParameters parameters, CancellationToken cancellationToken)
    {
        AccessToken token = await credential.GetTokenAsync(
            new TokenRequestContext(scopes, parentRequestId: null, tenantId: tenantId),
            cancellationToken
        );

        return new OidcAccessToken(token.Token, token.ExpiresOn - DateTimeOffset.UtcNow);
    }
}

Models/Configuration.cs

namespace SelectAlgorithm.Models;

public class AppConfiguration
{
    public AzureOpenAIConfiguration AzureOpenAI { get; set; } = new();
    public DocumentDBConfiguration DocumentDB { get; set; } = new();
    public EmbeddingConfiguration Embedding { get; set; } = new();
    public VectorSearchConfiguration VectorSearch { get; set; } = new();
    public DataFilesConfiguration DataFiles { get; set; } = new();
}

public class AzureOpenAIConfiguration
{
    public string Endpoint { get; set; } = string.Empty;
    public string EmbeddingModel { get; set; } = "text-embedding-3-small";
}

public class DocumentDBConfiguration
{
    public string ClusterName { get; set; } = string.Empty;
    public string DatabaseName { get; set; } = "Hotels";
    public int LoadBatchSize { get; set; } = 100;
}

public class EmbeddingConfiguration
{
    public string EmbeddedField { get; set; } = "DescriptionVector";
    public int Dimensions { get; set; } = 1536;
}

public class VectorSearchConfiguration
{
    public string Query { get; set; } = "luxury hotel near the beach";
    public string Similarity { get; set; } = "";
    public int TopK { get; set; } = 5;
}

public class DataFilesConfiguration
{
    public string WithVectors { get; set; } = "data/Hotels_Vector.json";
}

Models/HotelData.cs

using MongoDB.Bson;
using MongoDB.Bson.Serialization.Attributes;

namespace SelectAlgorithm.Models;

public class HotelData
{
    [BsonId]
    [BsonRepresentation(BsonType.ObjectId)]
    public string? Id { get; set; }
    
    public string HotelId { get; set; } = string.Empty;
    public string HotelName { get; set; } = string.Empty;
    public string Description { get; set; } = string.Empty;
    public string Category { get; set; } = string.Empty;
    
    [BsonExtraElements]
    public BsonDocument? ExtraElements { get; set; }
}

These supporting files provide:

  • Passwordless authentication setup for Azure OpenAI and DocumentDB.
  • OIDC token handler for automatic token refresh.
  • JSON file reading and deserialization.
  • Batch data insertion with error handling.
  • Results formatting and display.

Note

The .NET sample configures the DocumentDB connection with retryWrites=false, which is required for DocumentDB vector search operations.

Run the code

  1. Sign in with Azure CLI for passwordless authentication:

    az login
    
  2. Build the project:

    dotnet build
    
  3. Create the output directory:

    mkdir output
    

  1. Run the flat SelectAlgorithm.csproj entry point to compare all 9 algorithm × similarity combinations:

    dotnet run
    

    The application loads the sample data once, then creates and tests all 9 algorithm × similarity combinations sequentially.

Expected output

The application displays progress logs and a comparison table:

============================================================
  Compare All Algorithms × Metrics
  9 combinations: IVF, HNSW, DiskANN × COS, L2, IP
============================================================
Dropped existing 'hotels' collection (if any)

Loaded 50 documents with embeddings
Inserted 50/50 documents

Query: "luxury hotel near the beach"
Top K: 5
Embedding generated (reused for all searches)

Running 9 algorithm × metric combinations...
  ✓ vector_ivf_cos created
  ✓ vector_ivf_l2 created
  ✓ vector_ivf_ip created
  ✓ vector_hnsw_cos created
  ✓ vector_hnsw_l2 created
  ✓ vector_hnsw_ip created
  ✓ vector_diskann_cos created
  ✓ vector_diskann_l2 created
  ✓ vector_diskann_ip created

┌──────────┬────────┬────────────────────────────┬────────┬────────────────────────────┬────────┬───────┐
│ Algorithm│ Metric │ Top 1 Result               │ Score  │ Top 2 Result               │ Score  │ Diff  │
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ IVF      │ COS    │ Ocean Water Resort & Spa   │ 0.6184 │ Windy Ocean Motel          │ 0.5056 │ 0.1128│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ IVF      │ L2     │ Ocean Water Resort & Spa   │ 0.8736 │ Windy Ocean Motel          │ 0.9943 │ 0.1208│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ IVF      │ IP     │ Ocean Water Resort & Spa   │ 0.6184 │ Windy Ocean Motel          │ 0.5056 │ 0.1128│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ HNSW     │ COS    │ Ocean Water Resort & Spa   │ 0.6184 │ Windy Ocean Motel          │ 0.5056 │ 0.1128│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ HNSW     │ L2     │ Ocean Water Resort & Spa   │ 0.8736 │ Windy Ocean Motel          │ 0.9943 │ 0.1208│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ HNSW     │ IP     │ Ocean Water Resort & Spa   │ 0.6184 │ Windy Ocean Motel          │ 0.5056 │ 0.1128│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ DiskANN  │ COS    │ Ocean Water Resort & Spa   │ 0.6184 │ Windy Ocean Motel          │ 0.5056 │ 0.1128│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ DiskANN  │ L2     │ Ocean Water Resort & Spa   │ 0.8736 │ Windy Ocean Motel          │ 0.9943 │ 0.1208│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ DiskANN  │ IP     │ Ocean Water Resort & Spa   │ 0.6184 │ Windy Ocean Motel          │ 0.5056 │ 0.1128│
└──────────┴────────┴────────────────────────────┴────────┴────────────────────────────┴────────┴───────┘

Summary: 9 succeeded, 0 failed

Cleanup: dropped collection 'hotels'

Done!

The Diff column shows the score gap between the top-1 and top-2 results. A smaller diff indicates the algorithm found results with more similar relevance scores.

Choosing the right algorithm

Important

For production workloads, start with DiskANN on an M30+ cluster. DiskANN supports higher embedding dimensions, uses less cluster memory, and is less likely to require an index redesign as your models evolve.

Use this quick-reference table to select the right algorithm for your workload:

Scenario Algorithm Cluster tier Max dimensions
Dev/test, demos, small datasets IVF M10+ 2,000
Production (default) DiskANN M30+ 16,000
Production (max recall priority) HNSW M30+ 8,000

IVF (inverted file index):

  • Best for: Test environments, demos, and small clusters
  • Pros: Fast to build, low resource requirements
  • Cons: Lower recall compared to graph-based algorithms at scale
  • Tune: Increase numLists for larger datasets, increase nProbes for better recall

DiskANN (disk-based approximate nearest neighbor) — recommended for production:

  • Best for: Production workloads on M30+ clusters
  • Pros: Supports embeddings up to 16,000 dimensions, keeps most index data on disk freeing cluster memory for reads and writes, lighter index updates, easier backups, faster recovery
  • Cons: Requires M30+ cluster tier
  • Tune: Increase maxDegree and lBuild for better accuracy, increase lSearch for better recall
  • Why default: As embedding models evolve (some already exceed 8,000 dimensions), DiskANN avoids costly index redesigns. Its disk-based architecture also means your cluster memory stays available for operational workloads rather than index storage.

HNSW (hierarchical navigable small world):

  • Best for: Production workloads on M30+ clusters where maximum recall is the top priority
  • Pros: Excellent recall, fast queries
  • Cons: Requires M30+ cluster tier, supports embeddings up to 8,000 dimensions (vs 16,000 for DiskANN), higher memory usage since the full graph lives in RAM
  • Tune: Increase m and efConstruction for better index quality, increase efSearch for better recall

Choosing the right similarity function

Function Score meaning Best for
COS (Cosine) Higher = more similar (0–1) Text embeddings (normalized vectors)
L2 (Euclidean) Lower = more similar (distance) When magnitude matters
IP (Inner Product) Higher = more similar Equivalent to COS for normalized vectors

For the text-embedding-3-small model used in this quickstart, COS (cosine similarity) is recommended because OpenAI embeddings are normalized and optimized for cosine similarity.

Troubleshooting

Issue Solution
TimeoutException during connection Verify your environment variables are set correctly. Ensure your IP is in the DocumentDB firewall rules.
AuthenticationException Verify your Microsoft Entra token is valid. Run az login to refresh your credentials.
Build errors with .NET version Ensure you have .NET 8.0 or later installed. Run dotnet --version to check.
BsonSerializationException Ensure your model classes match the document structure in the collection.
Empty search results The vector index may take a few minutes to build. Wait 2-3 minutes after index creation, then rerun the script.
IndexOptionsConflict (code 85) DocumentDB doesn't allow multiple vector indexes of the same kind on the same field. Drop the existing index before creating a new one.

Clean up resources

Remove the database using the DocumentDB for VS Code extension:

  1. Install the DocumentDB for VS Code extension.
  2. Connect to your Azure DocumentDB cluster.
  3. Expand the cluster, right-click the Hotels database, and select Drop Database.