Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article explains how to compare all three vector search algorithms (DiskANN, HNSW, and IVF) in Azure DocumentDB using the .NET client library. The sample demonstrates how each algorithm performs with different similarity functions (COS, L2, IP) and helps you choose the right configuration for your workload. This quickstart uses a sample hotel dataset in a JSON file with precalculated vectors from the text-embedding-3-small model.
Find the sample code on GitHub.
Prerequisites
- An Azure subscription. If you don't have an Azure subscription, create a free account.
An existing Azure DocumentDB cluster. If you don't have a cluster, create a new cluster.
-
Custom domain is configured.
text-embedding-3-smallmodel is deployed.
Visual Studio Code. Ensure that you have the Azure DocumentDB extension.
Use the Bash environment in Azure Cloud Shell. For more information, see Get started with Azure Cloud Shell.
If you prefer to run CLI reference commands locally, install the Azure CLI. If you're running on Windows or macOS, consider running Azure CLI in a Docker container. For more information, see How to run the Azure CLI in a Docker container.
If you're using a local installation, sign in to the Azure CLI by using the az login command. To finish the authentication process, follow the steps displayed in your terminal. For other sign-in options, see Authenticate to Azure using Azure CLI.
When you're prompted, install the Azure CLI extension on first use. For more information about extensions, see Use and manage extensions with the Azure CLI.
Run az version to find the version and dependent libraries that are installed. To upgrade to the latest version, run az upgrade.
Azure Developer CLI (optional). Use
azd upto deploy all required Azure resources in one command..NET 8.0 SDK or later.
Create a .NET project
Create a new directory for your project and initialize the .NET console application:
mkdir select-algorithm-dotnet cd select-algorithm-dotnet dotnet new console --framework net8.0 --name SelectAlgorithm --output .
Verify the project was created:
ls SelectAlgorithm.csproj
Install the required NuGet packages:
dotnet add package Azure.AI.OpenAI --version 2.1.0 dotnet add package Azure.Identity --version 1.13.2 dotnet add package MongoDB.Driver --version 3.2.0 dotnet add package Microsoft.Extensions.Configuration --version 8.0.0 dotnet add package Microsoft.Extensions.Configuration.Binder --version 8.0.2 dotnet add package Microsoft.Extensions.Configuration.EnvironmentVariables --version 8.0.0 dotnet add package Microsoft.Extensions.Configuration.Json --version 8.0.1These packages provide:
Azure.AI.OpenAI: Azure OpenAI client library to create vector embeddings.Azure.Identity: Azure Identity library for passwordless authentication with DefaultAzureCredential.MongoDB.Driver: MongoDB driver for .NET to interact with DocumentDB.Microsoft.Extensions.Configuration*: Configuration and environment variable binding infrastructure.
Verify installed packages:
dotnet list package
Create data file with vectors
Create a new data directory for the hotels data file:
mkdir data
Download the
Hotels_Vector.jsonraw data file with vectors to yourdatadirectory:curl -o data/Hotels_Vector.json https://raw.githubusercontent.com/Azure-Samples/documentdb-samples/refs/heads/main/ai/data/Hotels_Vector.json
Verify the file downloaded successfully:
ls data/Hotels_Vector.json
You should see Hotels_Vector.json in the data directory.
Configure appsettings.json and environment variable overrides
Note
.NET uses the standard IConfiguration system with appsettings.json as the primary configuration source. Environment variables can override any setting using double-underscore (__) as the hierarchy separator. The other language quickstarts use flat environment variables (DOCUMENTDB_CLUSTER_NAME), but .NET's hierarchical configuration is the idiomatic pattern for this platform.
Create an
appsettings.jsonconfiguration file:touch appsettings.json
Add this content to
appsettings.json:{ "DocumentDB": { "DatabaseName": "Hotels", "ClusterName": "<your-cluster-name>", "LoadBatchSize": 100 }, "VectorSearch": { "Similarity": "", "TopK": 5, "Query": "luxury hotel near the beach" }, "AzureOpenAI": { "Endpoint": "https://<your-resource>.openai.azure.com/", "EmbeddingModel": "text-embedding-3-small" }, "DataFiles": { "WithVectors": "data/Hotels_Vector.json" }, "Embedding": { "EmbeddedField": "DescriptionVector", "Dimensions": 1536 } }Set any environment variable overrides in your current shell session. The sample uses
DefaultAzureCredentialfor passwordless authentication, and .NET maps environment variables toappsettings.jsonkeys with theSection__Keyformat:export AzureOpenAI__Endpoint="https://<your-resource>.openai.azure.com/" export AzureOpenAI__EmbeddingModel="text-embedding-3-small" export DocumentDB__ClusterName="<your-cluster-name>" export DocumentDB__DatabaseName="Hotels" export DataFiles__WithVectors="data/Hotels_Vector.json" export Embedding__EmbeddedField="DescriptionVector" export Embedding__Dimensions="1536" export AZURE_TENANT_ID="<your-tenant-id>"
Replace the placeholder values with your own information:
<your-resource>: Your Azure OpenAI resource name<your-cluster-name>: Your Azure DocumentDB cluster name<your-tenant-id>: Your Microsoft Entra tenant ID
These environment variables override the matching values in appsettings.json. For example, DocumentDB__ClusterName overrides DocumentDB:ClusterName, DocumentDB__DatabaseName overrides DocumentDB:DatabaseName, and AzureOpenAI__Endpoint overrides AzureOpenAI:Endpoint.
Prefer passwordless authentication. For more information on setting up managed identity and the full range of your authentication options, see Authenticate .NET apps to Azure services by using the Azure SDK for .NET.
Create code files
Continue the project by creating code files for vector search comparison. When you're done, the project structure should look like this:
select-algorithm-dotnet/
├── data/
│ └── Hotels_Vector.json
├── Models/
│ ├── Configuration.cs
│ └── HotelData.cs
├── Utilities/
│ └── AzureIdentityTokenHandler.cs
├── appsettings.json
├── CompareAll.cs
├── Program.cs
├── SelectAlgorithm.csproj
└── Utils.cs
Create the directory structure:
mkdir Models mkdir Utilities
Create the code files:
touch CompareAll.cs touch Utils.cs touch Models/Configuration.cs touch Models/HotelData.cs touch Utilities/AzureIdentityTokenHandler.cs
Create the algorithm comparison code
Create the following source files to implement the vector search comparison.
Program.cs
Replace the contents of Program.cs with this code:
using Microsoft.Extensions.Configuration;
using SelectAlgorithm.Models;
namespace SelectAlgorithm;
class Program
{
static void Main(string[] args)
{
Console.WriteLine();
Console.WriteLine("Select Algorithm Demo - Azure DocumentDB Vector Search (.NET)");
Console.WriteLine(new string('-', 60));
Console.WriteLine();
var configuration = new ConfigurationBuilder()
.SetBasePath(Directory.GetCurrentDirectory())
.AddJsonFile("appsettings.json", optional: false, reloadOnChange: true)
.AddEnvironmentVariables()
.Build();
var appConfig = new AppConfiguration();
configuration.Bind(appConfig);
var command = args.Length > 0 ? args[0].ToLower() : "compare-all";
switch (command)
{
case "compare-all":
CompareAll.Run(appConfig);
break;
default:
Console.WriteLine($"Unknown command: {command}");
Console.WriteLine("Usage: dotnet run -- compare-all");
return;
}
Console.WriteLine();
Console.WriteLine("Done!");
}
}
This main entry point:
- Loads configuration from appsettings.json and environment variables.
- Sets up dependency injection with logging infrastructure.
- Initializes Azure OpenAI and DocumentDB clients using passwordless authentication.
- Calls
CompareAll.Run()to execute the flat project entry point. - Runs the comparison and prints results in a table format.
CompareAll.cs
Add this code to CompareAll.cs:
/// Unified comparison runner for all 9 combinations (3 algorithms × 3 similarity metrics).
/// Executes vector searches sequentially for fair timing and prints a formatted comparison table.
namespace SelectAlgorithm;
using MongoDB.Driver;
using MongoDB.Bson;
using OpenAI.Embeddings;
using SelectAlgorithm.Models;
public static class CompareAll
{
private record IndexConfig(string Name, string Kind, string Similarity, BsonDocument ExtraParams);
private record SearchResult(string Algorithm, string Metric, string Top1Name, double Top1Score, string Top2Name, double Top2Score);
private static string GetAlgoDisplay(string kind) => kind switch
{
"vector-ivf" => "IVF",
"vector-hnsw" => "HNSW",
"vector-diskann" => "DiskANN",
_ => kind
};
public static void Run(AppConfiguration appConfig)
{
Console.WriteLine(new string('=', 60));
Console.WriteLine(" Compare All Algorithms × Metrics");
Console.WriteLine(" 9 combinations: IVF, HNSW, DiskANN × COS, L2, IP");
Console.WriteLine(new string('=', 60));
// Use config values with env var overrides for compare-specific settings
var databaseName = appConfig.DocumentDB.DatabaseName;
var dataFile = appConfig.DataFiles.WithVectors;
var vectorField = appConfig.Embedding.EmbeddedField;
var dimensions = appConfig.Embedding.Dimensions;
var batchSize = appConfig.DocumentDB.LoadBatchSize;
var queryText = Environment.GetEnvironmentVariable("QUERY_TEXT") ?? "luxury hotel near the beach";
var topK = int.Parse(Environment.GetEnvironmentVariable("TOP_K") ?? "5");
var mongoClient = Utils.GetMongoClientPasswordless(appConfig);
var embeddingClient = Utils.GetEmbeddingClient(appConfig);
try
{
var database = mongoClient.GetDatabase(databaseName);
// Drop collection for a clean comparison
database.DropCollection("hotels");
Console.WriteLine("Dropped existing 'hotels' collection (if any)");
var collection = database.GetCollection<BsonDocument>("hotels");
// Load data once into single collection
var data = Utils.ReadJsonFile(dataFile);
var documents = data.Where(d => d.Contains(vectorField)).ToList();
Console.WriteLine($"\nLoaded {documents.Count} documents with embeddings");
Utils.InsertData(collection, documents, batchSize);
// Generate ONE embedding for the query (reused for all 9 searches)
Console.WriteLine($"\nQuery: \"{queryText}\"");
Console.WriteLine($"Top K: {topK}");
var embeddingResult = embeddingClient.GenerateEmbedding(queryText);
var queryVector = embeddingResult.Value.ToFloats().ToArray();
Console.WriteLine("Embedding generated (reused for all searches)\n");
// Define 9 index configurations
var configs = BuildIndexConfigs();
// Run each config sequentially: drop→create→wait→search
// DocumentDB doesn't allow multiple vector indexes of the same kind on the same field
Console.WriteLine("Running 9 algorithm × metric combinations...\n");
var results = new List<SearchResult>();
foreach (var config in configs)
{
// 1. Drop all existing vector indexes
DropVectorIndexes(collection, vectorField);
// 2. Create this specific index
CreateIndex(collection, vectorField, dimensions, config);
Console.WriteLine($" ✓ {config.Name} created");
// 3. Search with retries while the index becomes available
var searchResults = RunVectorSearchWithRetry(collection, queryVector, vectorField, config.Name, topK);
if (searchResults.Count == 0)
{
results.Add(new SearchResult(GetAlgoDisplay(config.Kind), config.Similarity, "(failed)", 0.0, "(failed)", 0.0));
continue;
}
// 4. Extract top 2 results and record
var algoDisplay = GetAlgoDisplay(config.Kind);
var top1Name = "-"; var top1Score = 0.0;
var top2Name = "-"; var top2Score = 0.0;
if (searchResults.Count > 0)
{
var doc1 = searchResults[0];
top1Name = doc1.Contains("HotelName") ? doc1["HotelName"].AsString : "Unknown";
top1Score = doc1.Contains("score") ? doc1["score"].ToDouble() : 0.0;
}
if (searchResults.Count > 1)
{
var doc2 = searchResults[1];
top2Name = doc2.Contains("HotelName") ? doc2["HotelName"].AsString : "Unknown";
top2Score = doc2.Contains("score") ? doc2["score"].ToDouble() : 0.0;
}
results.Add(new SearchResult(algoDisplay, config.Similarity, top1Name, top1Score, top2Name, top2Score));
}
var successCount = results.Count(r => r.Top1Name != "(failed)");
// Print comparison table
PrintComparisonTable(results);
if (successCount == 0)
{
Console.WriteLine("\n❌ All 9 comparisons failed — no algorithm returned results.");
Environment.ExitCode = 1;
}
else
{
Console.WriteLine($"\nSummary: {successCount} succeeded, {9 - successCount} failed");
}
}
finally
{
// Cleanup: drop the comparison collection
try
{
var database = mongoClient.GetDatabase(databaseName);
database.DropCollection("hotels");
Console.WriteLine("\nCleanup: dropped collection 'hotels'");
}
catch (Exception ex)
{
Console.WriteLine($"Cleanup warning: {ex.Message}");
}
mongoClient.Cluster.Dispose();
}
}
private static List<IndexConfig> BuildIndexConfigs()
{
string[] metrics = ["COS", "L2", "IP"];
var configs = new List<IndexConfig>();
// IVF
foreach (var metric in metrics)
configs.Add(new IndexConfig($"vector_ivf_{metric.ToLower()}", "vector-ivf", metric, new BsonDocument { { "numLists", 1 } }));
// HNSW
foreach (var metric in metrics)
configs.Add(new IndexConfig($"vector_hnsw_{metric.ToLower()}", "vector-hnsw", metric, new BsonDocument { { "m", 16 }, { "efConstruction", 64 } }));
// DiskANN
foreach (var metric in metrics)
configs.Add(new IndexConfig($"vector_diskann_{metric.ToLower()}", "vector-diskann", metric, new BsonDocument { { "maxDegree", 32 }, { "lBuild", 50 } }));
return configs;
}
private static void DropVectorIndexes(IMongoCollection<BsonDocument> collection, string vectorField)
{
try
{
using var cursor = collection.Indexes.List();
foreach (var idx in cursor.ToList())
{
var name = idx.GetValue("name", "").AsString;
var key = idx.GetValue("key", new BsonDocument()).AsBsonDocument;
if (key.Contains(vectorField) && key[vectorField].AsString == "cosmosSearch")
{
try { collection.Indexes.DropOne(name); } catch { }
}
}
}
catch { }
}
private static void CreateIndex(IMongoCollection<BsonDocument> collection, string vectorField, int dimensions, IndexConfig config)
{
// Drop existing index with same name if present
try
{
collection.Indexes.DropOne(config.Name);
}
catch (MongoCommandException)
{
// Index doesn't exist, that's fine
}
var cosmosSearchOptions = new BsonDocument
{
{ "kind", config.Kind },
{ "dimensions", dimensions },
{ "similarity", config.Similarity }
};
foreach (var param in config.ExtraParams)
{
cosmosSearchOptions.Add(param);
}
var command = new BsonDocument
{
{ "createIndexes", collection.CollectionNamespace.CollectionName },
{ "indexes", new BsonArray
{
new BsonDocument
{
{ "name", config.Name },
{ "key", new BsonDocument(vectorField, "cosmosSearch") },
{ "cosmosSearchOptions", cosmosSearchOptions }
}
}
}
};
try
{
collection.Database.RunCommand<BsonDocument>(command);
}
catch (MongoCommandException ex) when (ex.Message.Contains("already exists"))
{
// Index already exists with same config — idempotent
}
}
private static List<BsonDocument> RunVectorSearch(
IMongoCollection<BsonDocument> collection,
float[] queryVector,
string vectorField,
string indexName,
int topK)
{
var pipeline = new[]
{
new BsonDocument("$search", new BsonDocument("cosmosSearch", new BsonDocument
{
{ "vector", new BsonArray(queryVector.Select(f => (double)f)) },
{ "path", vectorField },
{ "k", topK }
})),
new BsonDocument("$project", new BsonDocument
{
{ "HotelName", 1 },
{ "score", new BsonDocument("$meta", "searchScore") }
})
};
return collection.Aggregate<BsonDocument>(pipeline).ToList();
}
private static List<BsonDocument> RunVectorSearchWithRetry(
IMongoCollection<BsonDocument> collection,
float[] queryVector,
string vectorField,
string indexName,
int topK)
{
const int maxRetries = 5;
const int retryDelayMs = 2000;
for (var attempt = 0; attempt <= maxRetries; attempt++)
{
var results = RunVectorSearch(collection, queryVector, vectorField, indexName, topK);
if (results.Count > 0)
{
return results;
}
if (attempt < maxRetries)
{
Console.WriteLine($" No results for {indexName} yet. Retrying in 2 seconds ({attempt + 1}/{maxRetries})...");
Thread.Sleep(retryDelayMs);
}
}
Console.WriteLine($" Search for {indexName} did not return results after {maxRetries} retries. Recording as failed.");
return [];
}
private static void PrintComparisonTable(List<SearchResult> results)
{
Console.WriteLine();
Console.WriteLine("┌──────────┬────────┬────────────────────────────┬────────┬────────────────────────────┬────────┬───────┐");
Console.WriteLine($"│ {"Algorithm",-9}│ {"Metric",-7}│ {"Top 1 Result",-27}│ {"Score",-7}│ {"Top 2 Result",-27}│ {"Score",-7}│ {"Diff",-6}│");
Console.WriteLine("├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤");
for (var i = 0; i < results.Count; i++)
{
var r = results[i];
var diff = Math.Abs(r.Top1Score - r.Top2Score);
var top1Display = r.Top1Name.Length > 27 ? r.Top1Name[..24] + "..." : r.Top1Name;
var top2Display = r.Top2Name.Length > 27 ? r.Top2Name[..24] + "..." : r.Top2Name;
Console.WriteLine($"│ {r.Algorithm,-9}│ {r.Metric,-7}│ {top1Display,-27}│ {r.Top1Score,-7:F4}│ {top2Display,-27}│ {r.Top2Score,-7:F4}│ {diff,-6:F4}│");
if (i < results.Count - 1)
Console.WriteLine("├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤");
}
Console.WriteLine("└──────────┴────────┴────────────────────────────┴────────┴────────────────────────────┴────────┴───────┘");
}
}
This service:
- Manages the comparison workflow for all algorithms
- Creates collections and indexes for each algorithm/similarity combination
- Inserts data and executes vector searches
- Measures and collects latency metrics
- Configures algorithm-specific parameters for index creation and search
Supporting files
Create the following supporting files in the project:
Utils.cs
using MongoDB.Driver;
using MongoDB.Driver.Authentication.Oidc;
using MongoDB.Bson;
using MongoDB.Bson.Serialization;
using Azure.Identity;
using Azure.Core;
using Azure.AI.OpenAI;
using OpenAI.Embeddings;
using SelectAlgorithm.Models;
namespace SelectAlgorithm;
public class AzureOidcCallback : IOidcCallback
{
private readonly DefaultAzureCredential _credential;
private static readonly string[] Scopes = { "https://ossrdbms-aad.database.windows.net/.default" };
public AzureOidcCallback(DefaultAzureCredential credential) => _credential = credential;
public OidcAccessToken GetOidcAccessToken(OidcCallbackParameters parameters, CancellationToken cancellationToken)
{
var token = _credential.GetToken(new TokenRequestContext(Scopes), cancellationToken);
return new OidcAccessToken(token.Token, token.ExpiresOn - DateTimeOffset.UtcNow);
}
public async Task<OidcAccessToken> GetOidcAccessTokenAsync(OidcCallbackParameters parameters, CancellationToken cancellationToken)
{
var token = await _credential.GetTokenAsync(new TokenRequestContext(Scopes), cancellationToken);
return new OidcAccessToken(token.Token, token.ExpiresOn - DateTimeOffset.UtcNow);
}
}
public static class Utils
{
public static IMongoClient GetMongoClientPasswordless(AppConfiguration config)
{
var clusterName = config.DocumentDB.ClusterName;
if (string.IsNullOrEmpty(clusterName))
throw new InvalidOperationException("DocumentDB:ClusterName is required in appsettings.json");
var credential = new DefaultAzureCredential();
var connectionString = $"mongodb+srv://{clusterName}.global.mongocluster.cosmos.azure.com/";
var settings = MongoClientSettings.FromConnectionString(connectionString);
settings.ConnectTimeout = TimeSpan.FromSeconds(120);
settings.UseTls = true;
settings.RetryWrites = false;
// Custom OIDC callback using DefaultAzureCredential
// Chains through CLI, managed identity, etc.
var oidcCallback = new AzureOidcCallback(credential);
settings.Credential = MongoCredential.CreateOidcCredential(oidcCallback, null);
return new MongoClient(settings);
}
public static EmbeddingClient GetEmbeddingClient(AppConfiguration config)
{
var endpoint = config.AzureOpenAI.Endpoint;
if (string.IsNullOrEmpty(endpoint))
throw new InvalidOperationException("AzureOpenAI:Endpoint is required in appsettings.json");
var model = config.AzureOpenAI.EmbeddingModel;
var credential = new DefaultAzureCredential();
var azureClient = new AzureOpenAIClient(new Uri(endpoint), credential);
return azureClient.GetEmbeddingClient(model);
}
public static List<BsonDocument> ReadJsonFile(string path)
{
if (!File.Exists(path))
throw new FileNotFoundException($"Data file not found: {path}");
var json = File.ReadAllText(path);
return BsonSerializer.Deserialize<List<BsonDocument>>(json);
}
public static void InsertData(IMongoCollection<BsonDocument> collection, List<BsonDocument> data, int batchSize)
{
var totalDocuments = data.Count;
var existingCount = collection.CountDocuments(new BsonDocument());
if (existingCount >= totalDocuments)
{
Console.WriteLine($"Collection already has {existingCount} documents, skipping insert");
return;
}
if (existingCount > 0)
{
collection.DeleteMany(new BsonDocument());
}
var insertedCount = 0;
for (var i = 0; i < totalDocuments; i += batchSize)
{
var batch = data.Skip(i).Take(batchSize).ToList();
try
{
collection.InsertMany(batch, new InsertManyOptions { IsOrdered = false });
insertedCount += batch.Count;
}
catch (MongoBulkWriteException)
{
// Some documents may have been inserted before the error
insertedCount += batch.Count;
}
Thread.Sleep(100);
}
Console.WriteLine($"Inserted {insertedCount}/{totalDocuments} documents");
}
public static void DropVectorIndexes(IMongoCollection<BsonDocument> collection, string vectorField)
{
try
{
using var cursor = collection.Indexes.List();
var indexes = cursor.ToList();
foreach (var index in indexes)
{
if (index.Contains("key"))
{
var key = index["key"].AsBsonDocument;
if (key.Contains(vectorField) && key[vectorField].AsString == "cosmosSearch")
{
var indexName = index["name"].AsString;
collection.Indexes.DropOne(indexName);
Console.WriteLine($"Dropped existing vector index: {indexName}");
}
}
}
}
catch (Exception ex)
{
Console.WriteLine($"Warning: Error dropping indexes: {ex.Message}");
}
}
public static List<BsonDocument> PerformVectorSearch(
IMongoCollection<BsonDocument> collection,
EmbeddingClient client,
string query,
string vectorField,
string model,
int topK = 5)
{
var embeddingResult = client.GenerateEmbedding(query);
var queryVector = embeddingResult.Value.ToFloats().ToArray();
var pipeline = new[]
{
new BsonDocument("$search", new BsonDocument("cosmosSearch", new BsonDocument
{
{ "vector", new BsonArray(queryVector.Select(f => (double)f)) },
{ "path", vectorField },
{ "k", topK }
})),
new BsonDocument("$project", new BsonDocument
{
{ "document", "$$ROOT" },
{ "score", new BsonDocument("$meta", "searchScore") }
})
};
return collection.Aggregate<BsonDocument>(pipeline).ToList();
}
public static void PrintSearchResults(List<BsonDocument> results, string algorithm)
{
Console.WriteLine();
Console.WriteLine(new string('=', 60));
Console.WriteLine($" {algorithm} Search Results ({results.Count} found)");
Console.WriteLine(new string('=', 60));
for (var i = 0; i < results.Count; i++)
{
var result = results[i];
var doc = result.Contains("document") ? result["document"].AsBsonDocument : result;
var name = doc.Contains("HotelName") ? doc["HotelName"].AsString
: doc.Contains("name") ? doc["name"].AsString
: "Unknown";
var score = result.Contains("score") ? result["score"].ToDouble() : 0.0;
Console.WriteLine($" {i + 1}. {name} (score: {score:F4})");
}
Console.WriteLine();
}
}
Utilities/AzureIdentityTokenHandler.cs
using Azure.Core;
using MongoDB.Driver.Authentication.Oidc;
namespace SelectAlgorithm.Utilities;
internal sealed class AzureIdentityTokenHandler(
TokenCredential credential,
string? tenantId
) : IOidcCallback
{
private readonly string[] scopes = ["https://ossrdbms-aad.database.windows.net/.default"];
public OidcAccessToken GetOidcAccessToken(OidcCallbackParameters parameters, CancellationToken cancellationToken)
{
AccessToken token = credential.GetToken(
new TokenRequestContext(scopes, tenantId: tenantId),
cancellationToken
);
return new OidcAccessToken(token.Token, token.ExpiresOn - DateTimeOffset.UtcNow);
}
public async Task<OidcAccessToken> GetOidcAccessTokenAsync(OidcCallbackParameters parameters, CancellationToken cancellationToken)
{
AccessToken token = await credential.GetTokenAsync(
new TokenRequestContext(scopes, parentRequestId: null, tenantId: tenantId),
cancellationToken
);
return new OidcAccessToken(token.Token, token.ExpiresOn - DateTimeOffset.UtcNow);
}
}
Models/Configuration.cs
namespace SelectAlgorithm.Models;
public class AppConfiguration
{
public AzureOpenAIConfiguration AzureOpenAI { get; set; } = new();
public DocumentDBConfiguration DocumentDB { get; set; } = new();
public EmbeddingConfiguration Embedding { get; set; } = new();
public VectorSearchConfiguration VectorSearch { get; set; } = new();
public DataFilesConfiguration DataFiles { get; set; } = new();
}
public class AzureOpenAIConfiguration
{
public string Endpoint { get; set; } = string.Empty;
public string EmbeddingModel { get; set; } = "text-embedding-3-small";
}
public class DocumentDBConfiguration
{
public string ClusterName { get; set; } = string.Empty;
public string DatabaseName { get; set; } = "Hotels";
public int LoadBatchSize { get; set; } = 100;
}
public class EmbeddingConfiguration
{
public string EmbeddedField { get; set; } = "DescriptionVector";
public int Dimensions { get; set; } = 1536;
}
public class VectorSearchConfiguration
{
public string Query { get; set; } = "luxury hotel near the beach";
public string Similarity { get; set; } = "";
public int TopK { get; set; } = 5;
}
public class DataFilesConfiguration
{
public string WithVectors { get; set; } = "data/Hotels_Vector.json";
}
Models/HotelData.cs
using MongoDB.Bson;
using MongoDB.Bson.Serialization.Attributes;
namespace SelectAlgorithm.Models;
public class HotelData
{
[BsonId]
[BsonRepresentation(BsonType.ObjectId)]
public string? Id { get; set; }
public string HotelId { get; set; } = string.Empty;
public string HotelName { get; set; } = string.Empty;
public string Description { get; set; } = string.Empty;
public string Category { get; set; } = string.Empty;
[BsonExtraElements]
public BsonDocument? ExtraElements { get; set; }
}
These supporting files provide:
- Passwordless authentication setup for Azure OpenAI and DocumentDB.
- OIDC token handler for automatic token refresh.
- JSON file reading and deserialization.
- Batch data insertion with error handling.
- Results formatting and display.
Note
The .NET sample configures the DocumentDB connection with retryWrites=false, which is required for DocumentDB vector search operations.
Run the code
Sign in with Azure CLI for passwordless authentication:
az loginBuild the project:
dotnet buildCreate the output directory:
mkdir output
Run the flat
SelectAlgorithm.csprojentry point to compare all 9 algorithm × similarity combinations:dotnet runThe application loads the sample data once, then creates and tests all 9 algorithm × similarity combinations sequentially.
Expected output
The application displays progress logs and a comparison table:
============================================================
Compare All Algorithms × Metrics
9 combinations: IVF, HNSW, DiskANN × COS, L2, IP
============================================================
Dropped existing 'hotels' collection (if any)
Loaded 50 documents with embeddings
Inserted 50/50 documents
Query: "luxury hotel near the beach"
Top K: 5
Embedding generated (reused for all searches)
Running 9 algorithm × metric combinations...
✓ vector_ivf_cos created
✓ vector_ivf_l2 created
✓ vector_ivf_ip created
✓ vector_hnsw_cos created
✓ vector_hnsw_l2 created
✓ vector_hnsw_ip created
✓ vector_diskann_cos created
✓ vector_diskann_l2 created
✓ vector_diskann_ip created
┌──────────┬────────┬────────────────────────────┬────────┬────────────────────────────┬────────┬───────┐
│ Algorithm│ Metric │ Top 1 Result │ Score │ Top 2 Result │ Score │ Diff │
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ IVF │ COS │ Ocean Water Resort & Spa │ 0.6184 │ Windy Ocean Motel │ 0.5056 │ 0.1128│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ IVF │ L2 │ Ocean Water Resort & Spa │ 0.8736 │ Windy Ocean Motel │ 0.9943 │ 0.1208│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ IVF │ IP │ Ocean Water Resort & Spa │ 0.6184 │ Windy Ocean Motel │ 0.5056 │ 0.1128│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ HNSW │ COS │ Ocean Water Resort & Spa │ 0.6184 │ Windy Ocean Motel │ 0.5056 │ 0.1128│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ HNSW │ L2 │ Ocean Water Resort & Spa │ 0.8736 │ Windy Ocean Motel │ 0.9943 │ 0.1208│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ HNSW │ IP │ Ocean Water Resort & Spa │ 0.6184 │ Windy Ocean Motel │ 0.5056 │ 0.1128│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ DiskANN │ COS │ Ocean Water Resort & Spa │ 0.6184 │ Windy Ocean Motel │ 0.5056 │ 0.1128│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ DiskANN │ L2 │ Ocean Water Resort & Spa │ 0.8736 │ Windy Ocean Motel │ 0.9943 │ 0.1208│
├──────────┼────────┼────────────────────────────┼────────┼────────────────────────────┼────────┼───────┤
│ DiskANN │ IP │ Ocean Water Resort & Spa │ 0.6184 │ Windy Ocean Motel │ 0.5056 │ 0.1128│
└──────────┴────────┴────────────────────────────┴────────┴────────────────────────────┴────────┴───────┘
Summary: 9 succeeded, 0 failed
Cleanup: dropped collection 'hotels'
Done!
The Diff column shows the score gap between the top-1 and top-2 results. A smaller diff indicates the algorithm found results with more similar relevance scores.
Choosing the right algorithm
Important
For production workloads, start with DiskANN on an M30+ cluster. DiskANN supports higher embedding dimensions, uses less cluster memory, and is less likely to require an index redesign as your models evolve.
Use this quick-reference table to select the right algorithm for your workload:
| Scenario | Algorithm | Cluster tier | Max dimensions |
|---|---|---|---|
| Dev/test, demos, small datasets | IVF | M10+ | 2,000 |
| Production (default) | DiskANN | M30+ | 16,000 |
| Production (max recall priority) | HNSW | M30+ | 8,000 |
IVF (inverted file index):
- Best for: Test environments, demos, and small clusters
- Pros: Fast to build, low resource requirements
- Cons: Lower recall compared to graph-based algorithms at scale
- Tune: Increase
numListsfor larger datasets, increasenProbesfor better recall
DiskANN (disk-based approximate nearest neighbor) — recommended for production:
- Best for: Production workloads on M30+ clusters
- Pros: Supports embeddings up to 16,000 dimensions, keeps most index data on disk freeing cluster memory for reads and writes, lighter index updates, easier backups, faster recovery
- Cons: Requires M30+ cluster tier
- Tune: Increase
maxDegreeandlBuildfor better accuracy, increaselSearchfor better recall - Why default: As embedding models evolve (some already exceed 8,000 dimensions), DiskANN avoids costly index redesigns. Its disk-based architecture also means your cluster memory stays available for operational workloads rather than index storage.
HNSW (hierarchical navigable small world):
- Best for: Production workloads on M30+ clusters where maximum recall is the top priority
- Pros: Excellent recall, fast queries
- Cons: Requires M30+ cluster tier, supports embeddings up to 8,000 dimensions (vs 16,000 for DiskANN), higher memory usage since the full graph lives in RAM
- Tune: Increase
mandefConstructionfor better index quality, increaseefSearchfor better recall
Choosing the right similarity function
| Function | Score meaning | Best for |
|---|---|---|
| COS (Cosine) | Higher = more similar (0–1) | Text embeddings (normalized vectors) |
| L2 (Euclidean) | Lower = more similar (distance) | When magnitude matters |
| IP (Inner Product) | Higher = more similar | Equivalent to COS for normalized vectors |
For the text-embedding-3-small model used in this quickstart, COS (cosine similarity) is recommended because OpenAI embeddings are normalized and optimized for cosine similarity.
Troubleshooting
| Issue | Solution |
|---|---|
TimeoutException during connection |
Verify your environment variables are set correctly. Ensure your IP is in the DocumentDB firewall rules. |
AuthenticationException |
Verify your Microsoft Entra token is valid. Run az login to refresh your credentials. |
| Build errors with .NET version | Ensure you have .NET 8.0 or later installed. Run dotnet --version to check. |
BsonSerializationException |
Ensure your model classes match the document structure in the collection. |
| Empty search results | The vector index may take a few minutes to build. Wait 2-3 minutes after index creation, then rerun the script. |
IndexOptionsConflict (code 85) |
DocumentDB doesn't allow multiple vector indexes of the same kind on the same field. Drop the existing index before creating a new one. |
Clean up resources
Remove the database using the DocumentDB for VS Code extension:
- Install the DocumentDB for VS Code extension.
- Connect to your Azure DocumentDB cluster.
- Expand the cluster, right-click the Hotels database, and select Drop Database.