Quickstart: Vector search with .NET in Azure DocumentDB

Learn to use vector search in Azure DocumentDB with the .NET MongoDB driver to store and query vector data efficiently.

This quickstart provides a guided tour of key vector search techniques using a .NET sample app on GitHub.

The app uses a sample hotel dataset in a JSON file with pre-calculated vectors from the text-embedding-ada-002 model, though you can also generate the vectors yourself. The hotel data includes hotel names, locations, descriptions, and vector embeddings.

Prerequisites

An Azure subscription
- If you don't have an Azure subscription, create a free account

An existing Azure DocumentDB cluster
- If you don't have a cluster, create a new cluster

Role Based Access Control (RBAC) enabled
Firewall configured to allow access to your client IP address
Azure OpenAI resource
- Role Based Access Control (RBAC) enabled
- text-embedding-ada-002 model deployed
Visual Studio Code
- DocumentDB extension

Use the Bash environment in Azure Cloud Shell. For more information, see Get started with Azure Cloud Shell.
If you prefer to run CLI reference commands locally, install the Azure CLI. If you're running on Windows or macOS, consider running Azure CLI in a Docker container. For more information, see How to run the Azure CLI in a Docker container.
- If you're using a local installation, sign in to the Azure CLI by using the az login command. To finish the authentication process, follow the steps displayed in your terminal. For other sign-in options, see Authenticate to Azure using Azure CLI.
- When you're prompted, install the Azure CLI extension on first use. For more information about extensions, see Use and manage extensions with the Azure CLI.
- Run az version to find the version and dependent libraries that are installed. To upgrade to the latest version, run az upgrade.

.NET 8.0 SDK or later
- C# extension for Visual Studio Code

App dependencies

The app uses the following NuGet packages:

Azure.Identity: Azure Identity library for passwordless authentication with Microsoft Entra ID
Azure.AI.OpenAI: Azure OpenAI client library to communicate with AI models and create vector embeddings
Microsoft.Extensions.Configuration: Configuration management for app settings
MongoDB.Driver: Official MongoDB .NET driver for database connectivity and operations
Newtonsoft.Json: Popular JSON serialization and deserialization library

Configure and run the app

Complete the following steps to configure the app with your own values and run searches against your Azure DocumentDB cluster.

Configure the app

Update the appsettings.json placeholder values with your own:

{
  "AzureOpenAI": {
    "EmbeddingModel": "text-embedding-ada-002",
    "ApiVersion": "2023-05-15",
    "Endpoint": "https://<your-openai-service-name>.openai.azure.com/"
  },
  "DataFiles": {
    "WithoutVectors": "HotelsData_toCosmosDB.JSON",
    "WithVectors": "HotelsData_toCosmosDB_Vector.json"
  },
  "Embedding": {
    "FieldToEmbed": "Description",
    "EmbeddedField": "text_embedding_ada_002",
    "Dimensions": 1536,
    "BatchSize": 16
  },
  "MongoDB": {
    "TenantId": "<your-tenant-id>",
    "ClusterName": "<your-cluster-name>",
    "LoadBatchSize": 100
  },
  "VectorSearch": {
    "Query": "quintessential lodging near running trails, eateries, retail",
    "DatabaseName": "Hotels",
    "TopK": 5
  }
}

Authenticate to Azure

The sample app uses passwordless authentication via DefaultAzureCredential and Microsoft Entra ID. Sign in to Azure using a supported tool such as the Azure CLI or Azure PowerShell before you run the application so it can access Azure resources securely.

Note

Ensure your signed-in identity has the required data plane roles on both the Azure DocumentDB account and the Azure OpenAI resource.

az login

azd auth login

Connect-AzAccount

Build and run the project

The sample app populates vectorized sample data in a MongoDB collection and lets you run different types of search queries.

Use the dotnet run command to start the app:

dotnet run

The app prints a menu for you to select database and search options:

=== Cosmos DB Vector Samples Menu ===
Please enter your choice (0-5):
1. Create embeddings for data
2. Show all database indexes
3. Run IVF vector search
4. Run HNSW vector search
5. Run DiskANN vector search
0. Exit

Type 5 and press enter.

After the app populates the database and runs the search, you see the top five hotels that match the selected vector search query and their similarity scores.

The app logging and output show:

Collection creation and data insertion status
Vector index creation confirmation
Search results with hotel names, locations, and similarity scores

Example output (shortened for brevity):

MongoDB client initialized with passwordless authentication
Starting DiskANN vector search workflow
Collection is empty, loading data from file
Successfully loaded 50 documents into collection
Creating vector index 'vectorIndex_diskann'
Vector index 'vectorIndex_diskann' is ready for DiskANN search
Executing DiskANN vector search for top 5 results

Search Results (5 found using DiskANN):
1. Roach Motel (Similarity: 0.8399)
2. Royal Cottage Resort (Similarity: 0.8385)
3. Economy Universe Motel (Similarity: 0.8360)
4. Foot Happy Suites (Similarity: 0.8354)
5. Country Comfort Inn (Similarity: 0.8346)

Use the dotnet run command to start the app:

dotnet run

The app prints a menu for you to select database and search options:

=== Cosmos DB Vector Samples Menu ===
Please enter your choice (0-5):
1. Create embeddings for data
2. Show all database indexes
3. Run IVF vector search
4. Run HNSW vector search
5. Run DiskANN vector search
0. Exit

Type 3 and press enter.

After the app populates the database and runs the search, you see the top five hotels that match the selected vector search query and their similarity scores.

The app logging and output show:

Collection creation and data insertion status
Vector index creation confirmation
Search results with hotel names, locations, and similarity scores

Example output (shortened for brevity):

MongoDB client initialized with passwordless authentication
Starting IVF vector search workflow
Collection is empty, loading data from file
Successfully loaded 50 documents into collection
Creating vector index 'vectorIndex_ivf'
Vector index 'vectorIndex_ivf' is ready for IVF search
Executing IVF vector search for top 5 results

Search Results (5 found using IVF):
1. Roach Motel (Similarity: 0.8399)
2. Royal Cottage Resort (Similarity: 0.8385)
3. Economy Universe Motel (Similarity: 0.8360)
4. Foot Happy Suites (Similarity: 0.8354)
5. Country Comfort Inn (Similarity: 0.8346)

Use the dotnet run command to start the app:

dotnet run

The app prints a menu for you to select database and search options:

=== Cosmos DB Vector Samples Menu ===
Please enter your choice (0-5):
1. Create embeddings for data
2. Show all database indexes
3. Run IVF vector search
4. Run HNSW vector search
5. Run DiskANN vector search
0. Exit

Type 4 and press enter.

After the app populates the database and runs the search, you see the top five hotels that match the selected vector search query and their similarity scores.

The app logging and output show:

Collection creation and data insertion status
Vector index creation confirmation
Search results with hotel names, locations, and similarity scores

Example output (shortened for brevity):

MongoDB client initialized with passwordless authentication
Starting HNSW vector search workflow
Collection is empty, loading data from file
Successfully loaded 50 documents into collection
Creating vector index 'vectorIndex_hnsw'
Vector index 'vectorIndex_hnsw' is ready for HNSW search
Executing HNSW vector search for top 5 results

Search Results (5 found using HNSW):
1. Roach Motel (Similarity: 0.8399)
2. Royal Cottage Resort (Similarity: 0.8385)
3. Economy Universe Motel (Similarity: 0.8360)
4. Foot Happy Suites (Similarity: 0.8354)
5. Country Comfort Inn (Similarity: 0.8346)

Explore the app code

The following sections provide details about the most important services and code in the sample app. Visit the GitHub repo to explore the full app code.

Explore the search service

The VectorSearchService orchestrates an end‑to‑end vector similarity search using IVF, HNSW, and DiskANN search techniques with Azure OpenAI embeddings.

using Azure.AI.OpenAI;
using Azure.Identity;
using CosmosDbVectorSamples.Models;
using Microsoft.Extensions.Logging;
using MongoDB.Bson;
using MongoDB.Driver;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System.Reflection;

namespace CosmosDbVectorSamples.Services.VectorSearch;

/// <summary>
/// Service for performing vector similarity searches using different algorithms (IVF, HNSW, DiskANN).
/// Handles data loading, vector index creation, query embedding generation, and search execution.
/// </summary>
public class VectorSearchService
{
    private readonly ILogger<VectorSearchService> _logger;
    private readonly AzureOpenAIClient _openAIClient;
    private readonly MongoDbService _mongoService;
    private readonly AppConfiguration _config;

    public VectorSearchService(ILogger<VectorSearchService> logger, MongoDbService mongoService, AppConfiguration config)
    {
        _logger = logger;
        _mongoService = mongoService;
        _config = config;
        
        // Initialize Azure OpenAI client with passwordless authentication
        _openAIClient = new AzureOpenAIClient(new Uri(_config.AzureOpenAI.Endpoint), new DefaultAzureCredential());
    }

    /// <summary>
    /// Executes a complete vector search workflow: data setup, index creation, query embedding, and search
    /// </summary>
    /// <param name="indexType">The vector search algorithm to use (IVF, HNSW, or DiskANN)</param>
    public async Task RunSearchAsync(VectorIndexType indexType)
    {
        try
        {
            _logger.LogInformation($"Starting {indexType} vector search workflow");
            
            // Setup collection
            var collectionSuffix = indexType switch 
            { 
                VectorIndexType.IVF => "ivf", 
                VectorIndexType.HNSW => "hnsw", 
                VectorIndexType.DiskANN => "diskann", 
                _ => throw new ArgumentException($"Unknown index type: {indexType}") 
            };
            var collectionName = $"hotels_{collectionSuffix}_fixed";
            var indexName = $"vectorIndex_{collectionSuffix}";
            
            var collection = _mongoService.GetCollection<HotelData>(_config.VectorSearch.DatabaseName, collectionName);
            
            // Load data from file if collection is empty
            var assemblyLocation = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location) ?? string.Empty;
            var dataFilePath = Path.Combine(assemblyLocation, _config.DataFiles.WithVectors);
            await _mongoService.LoadDataIfNeededAsync(collection, dataFilePath);

            // Create the vector index with algorithm-specific search options
            var searchOptions = indexType switch
            {
                VectorIndexType.IVF => CreateIVFSearchOptions(_config.Embedding.Dimensions),
                VectorIndexType.HNSW => CreateHNSWSearchOptions(_config.Embedding.Dimensions),
                VectorIndexType.DiskANN => CreateDiskANNSearchOptions(_config.Embedding.Dimensions),
                _ => throw new ArgumentException($"Unknown index type: {indexType}")
            };
            
            await _mongoService.CreateVectorIndexAsync(
                _config.VectorSearch.DatabaseName, collectionName, indexName,
                _config.Embedding.EmbeddedField, searchOptions);
            
            _logger.LogInformation($"Vector index '{indexName}' is ready for {indexType} search");
            await Task.Delay(5000); // Allow index to be fully initialized

            // Create embedding for the query
            var embeddingClient = _openAIClient.GetEmbeddingClient(_config.AzureOpenAI.EmbeddingModel);
            var queryEmbedding = (await embeddingClient.GenerateEmbeddingAsync(_config.VectorSearch.Query)).Value.ToFloats().ToArray();
            _logger.LogInformation($"Generated query embedding with {queryEmbedding.Length} dimensions");

            // Build MongoDB aggregation pipeline for vector search
            var searchPipeline = new BsonDocument[]
            {
                // Vector similarity search using cosmosSearch
                new BsonDocument("$search", new BsonDocument
                {
                    ["cosmosSearch"] = new BsonDocument
                    {
                        ["vector"] = new BsonArray(queryEmbedding.Select(f => new BsonDouble(f))),
                        ["path"] = _config.Embedding.EmbeddedField,  // Field containing embeddings
                        ["k"] = _config.VectorSearch.TopK           // Number of results to return
                    }
                }),
                // Project results with similarity scores
                new BsonDocument("$project", new BsonDocument
                {
                    ["score"] = new BsonDocument("$meta", "searchScore"),
                    ["document"] = "$$ROOT"
                })
            };

            // Execute and process the search
            _logger.LogInformation($"Executing {indexType} vector search for top {_config.VectorSearch.TopK} results");
            var searchResults = (await collection.AggregateAsync<BsonDocument>(searchPipeline)).ToList()
                .Select(result => new SearchResult 
                { 
                    Document = MongoDB.Bson.Serialization.BsonSerializer.Deserialize<HotelData>(result["document"].AsBsonDocument), 
                    Score = result["score"].AsDouble 
                }).ToList();

            // Print the results
            if (searchResults?.Count == 0) 
            { 
                _logger.LogInformation("❌ No search results found. Check query terms and data availability."); 
            }
            else
            {
                _logger.LogInformation($"\n✅ Search Results ({searchResults!.Count} found using {indexType}):");
                for (int i = 0; i < searchResults.Count; i++)
                {
                    var result = searchResults[i];
                    var hotelName = result.Document?.HotelName ?? "Unknown Hotel";
                    _logger.LogInformation($"  {i + 1}. {hotelName} (Similarity: {result.Score:F4})");
                }
            }
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, $"{indexType} vector search failed");
            throw;
        }
    }

    /// <summary>
    /// Creates IVF (Inverted File) search options - good for large datasets with fast approximate search
    /// </summary>
    private BsonDocument CreateIVFSearchOptions(int dimensions) => new BsonDocument
    {
        ["kind"] = "vector-ivf",
        ["similarity"] = "COS",
        ["dimensions"] = dimensions,
        ["numLists"] = 1
    };

    /// <summary>
    /// Creates HNSW (Hierarchical Navigable Small World) search options - best accuracy/speed balance
    /// </summary>
    private BsonDocument CreateHNSWSearchOptions(int dimensions) => new BsonDocument
    {
        ["kind"] = "vector-hnsw",
        ["similarity"] = "COS",
        ["dimensions"] = dimensions,
        ["m"] = 16,
        ["efConstruction"] = 64
    };

    /// <summary>
    /// Creates DiskANN search options - optimized for very large datasets stored on disk
    /// </summary>
    private BsonDocument CreateDiskANNSearchOptions(int dimensions) => new BsonDocument
    {
        ["kind"] = "vector-diskann",
        ["similarity"] = "COS",
        ["dimensions"] = dimensions
    };
}

In the preceding code, the VectorSearchService performs the following tasks:

Determines the collection and index names based on the requested algorithm
Creates or gets the MongoDB collection and loads JSON data if it's empty
Builds the algorithm-specific index options (IVF / HNSW / DiskANN) and ensures the vector index exists
Generates an embedding for the configured query via Azure OpenAI
Constructs and runs the aggregation search pipeline
Deserializes and prints the results

Explore the Azure DocumentDB service

The MongoDbService manages interactions with Azure DocumentDB to handle tasks like loading data, vector index creation, index listing, and bulk inserts for hotel vector search.

using Azure.Identity;
using CosmosDbVectorSamples.Models;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.Logging;
using MongoDB.Bson;
using MongoDB.Driver;
using Newtonsoft.Json;

namespace CosmosDbVectorSamples.Services;

/// <summary>
/// Service for MongoDB operations including data insertion, index management, and vector index creation.
/// Supports Azure Cosmos DB for MongoDB with passwordless authentication.
/// </summary>
public class MongoDbService
{
    private readonly ILogger<MongoDbService> _logger;
    private readonly AppConfiguration _config;
    private readonly MongoClient _client;

    public MongoDbService(ILogger<MongoDbService> logger, IConfiguration configuration)
    {
        _logger = logger;
        _config = new AppConfiguration();
        configuration.Bind(_config);
        
        // Validate configuration
        if (string.IsNullOrEmpty(_config.MongoDB.ClusterName))
            throw new InvalidOperationException("MongoDB connection not configured. Please provide ConnectionString or ClusterName.");
            
        // Configure MongoDB connection for Azure Cosmos DB with OIDC authentication
        var connectionString = $"mongodb+srv://{_config.MongoDB.ClusterName}.global.mongocluster.cosmos.azure.com/?tls=true&authMechanism=MONGODB-OIDC&retrywrites=false&maxIdleTimeMS=120000";
        var settings = MongoClientSettings.FromUrl(MongoUrl.Create(connectionString));
        settings.UseTls = true;
        settings.RetryWrites = false;
        settings.MaxConnectionIdleTime = TimeSpan.FromMinutes(2);
        settings.Credential = MongoCredential.CreateOidcCredential(new AzureIdentityTokenHandler(new DefaultAzureCredential(), _config.MongoDB.TenantId));
        settings.Freeze();
        
        _client = new MongoClient(settings);
        _logger.LogInformation("MongoDB client initialized with passwordless authentication");
    }

    /// <summary>Gets a database instance by name</summary>
    public IMongoDatabase GetDatabase(string databaseName) => _client.GetDatabase(databaseName);
    
    /// <summary>Gets a collection instance from the specified database</summary>
    public IMongoCollection<T> GetCollection<T>(string databaseName, string collectionName) => 
        _client.GetDatabase(databaseName).GetCollection<T>(collectionName);

    /// <summary>
    /// Creates a vector search index for Cosmos DB MongoDB, with support for IVF, HNSW, and DiskANN algorithms
    /// </summary>
    public async Task<BsonDocument> CreateVectorIndexAsync(string databaseName, string collectionName, string indexName, string embeddedField, BsonDocument cosmosSearchOptions)
    {
        var database = _client.GetDatabase(databaseName);
        var collection = database.GetCollection<BsonDocument>(collectionName);
        
        // Check if index already exists to avoid duplication
        var indexList = await (await collection.Indexes.ListAsync()).ToListAsync();
        if (indexList.Any(index => index.TryGetValue("name", out var nameValue) && nameValue.AsString == indexName))
        {
            _logger.LogInformation($"Vector index '{indexName}' already exists, skipping creation");
            return new BsonDocument { ["ok"] = 1 };
        }
        
        // Create the specified vector index type
        _logger.LogInformation($"Creating vector index '{indexName}' on field '{embeddedField}'");
        return await database.RunCommandAsync<BsonDocument>(new BsonDocument
        {
            ["createIndexes"] = collectionName,
            ["indexes"] = new BsonArray 
            { 
                new BsonDocument 
                { 
                    ["name"] = indexName, 
                    ["key"] = new BsonDocument { [embeddedField] = "cosmosSearch" }, 
                    ["cosmosSearchOptions"] = cosmosSearchOptions 
                } 
            }
        });
    }

    /// <summary>
    /// Displays all indexes across all user databases, excluding system databases
    /// </summary>
    public async Task ShowAllIndexesAsync()
    {
        try
        {
            // Get user databases (exclude system databases)
            var databases = (await (await _client.ListDatabaseNamesAsync()).ToListAsync())
                .Where(name => !new[] { "admin", "config", "local" }.Contains(name)).ToList();
                
            if (!databases.Any()) 
            { 
                _logger.LogInformation("No user databases found or access denied"); 
                return; 
            }

            foreach (var dbName in databases)
            {
                var database = _client.GetDatabase(dbName);
                var collections = await (await database.ListCollectionNamesAsync()).ToListAsync();
                
                if (!collections.Any()) 
                { 
                    _logger.LogInformation($"Database '{dbName}': No collections found"); 
                    continue; 
                }
                
                _logger.LogInformation($"\n📂 DATABASE: {dbName} ({collections.Count} collections)");
                
                // Display indexes for each collection
                foreach (var collName in collections)
                {
                    try
                    {
                        var indexList = await (await database.GetCollection<BsonDocument>(collName).Indexes.ListAsync()).ToListAsync();
                        _logger.LogInformation($"\n  🗃️ COLLECTION: {collName} ({indexList.Count} indexes)");
                        indexList.ForEach(index => _logger.LogInformation($"    Index: {index.ToJson()}"));
                    }
                    catch (Exception ex) 
                    { 
                        _logger.LogError(ex, $"Failed to list indexes for collection '{collName}'"); 
                    }
                }
            }
        }
        catch (Exception ex) 
        { 
            _logger.LogError(ex, "Failed to retrieve database indexes"); 
            throw; 
        }
    }

    /// <summary>
    /// Loads data from file into collection if the collection is empty
    /// </summary>
    /// <param name="collection">Target collection to load data into</param>
    /// <param name="dataFilePath">Path to the JSON data file containing vector embeddings</param>
    /// <returns>Number of documents loaded, or 0 if collection already had data</returns>
    public async Task<int> LoadDataIfNeededAsync<T>(IMongoCollection<T> collection, string dataFilePath) where T : class
    {
        var existingDocCount = await collection.CountDocumentsAsync(Builders<T>.Filter.Empty);

        // Skip loading if collection already has data
        if (existingDocCount > 0)
        {
            _logger.LogInformation("Collection already contains data, skipping load operation");
            return 0;
        }

        // Load and validate data file
        _logger.LogInformation("Collection is empty, loading data from file");
        if (!File.Exists(dataFilePath))
            throw new FileNotFoundException($"Vector data file not found: {dataFilePath}");

        var jsonContent = await File.ReadAllTextAsync(dataFilePath);
        var data = JsonConvert.DeserializeObject<List<T>>(jsonContent) ?? new List<T>();
        
        if (data.Count == 0)
            throw new InvalidOperationException("No data found in the vector data file");

        // Insert data using existing method
        var summary = await InsertDataAsync(collection, data);
        _logger.LogInformation($"Successfully loaded {summary.Inserted} documents into collection");
        
        return summary.Inserted;
    }

    /// <summary>
    /// Inserts data into MongoDB collection, converts JSON embeddings to float arrays, and creates standard indexes
    /// </summary>
    public async Task<InsertSummary> InsertDataAsync<T>(IMongoCollection<T> collection, IEnumerable<T> data)
    {
        var dataList = data.ToList();
        _logger.LogInformation($"Processing {dataList.Count} items for insertion");

        // Convert JSON array embeddings to float arrays for vector search compatibility
        foreach (var hotel in dataList.OfType<HotelData>().Where(h => h.ExtraElements != null))
            foreach (var kvp in hotel.ExtraElements.ToList().Where(k => k.Value is Newtonsoft.Json.Linq.JArray))
                hotel.ExtraElements[kvp.Key] = ((Newtonsoft.Json.Linq.JArray)kvp.Value).Select(token => (float)token).ToArray();

        int inserted = 0, failed = 0;
        try
        {
            // Use unordered insert for better performance
            await collection.InsertManyAsync(dataList, new InsertManyOptions { IsOrdered = false });
            inserted = dataList.Count;
            _logger.LogInformation($"Successfully inserted {inserted} items");
        }
        catch (Exception ex)
        {
            failed = dataList.Count;
            _logger.LogError(ex, $"Batch insert failed for {dataList.Count} items");
        }

        // Create standard indexes for common query fields
        var indexFields = new[] { "HotelId", "Category", "Description", "Description_fr" };
        foreach (var field in indexFields)
            await collection.Indexes.CreateOneAsync(new CreateIndexModel<T>(Builders<T>.IndexKeys.Ascending(field)));

        return new InsertSummary { Total = dataList.Count, Inserted = inserted, Failed = failed };
    }

    /// <summary>Disposes the MongoDB client and its resources</summary>
    public void Dispose() => _client?.Cluster?.Dispose();
}

In the preceding code, the MongoDbService performs the following tasks:

Reads configuration and builds a passwordless client with Azure credentials
Provides database or collection references on demand
Creates a vector search index only if it doesn't already exist
Lists all non-system databases, their collections, and each collection's indexes
Inserts sample data if the collection is empty and adds supporting indexes

View and manage data in Visual Studio Code

Install the DocumentDB extension and C# extension in Visual Studio Code.
Connect to your Azure DocumentDB account using the DocumentDB extension.
View the data and indexes in the Hotels database.

Clean up resources

Delete the resource group, Azure DocumentDB cluster, and Azure OpenAI resource when you no longer need them to avoid unnecessary costs.

Feedback

Was this page helpful?

Last updated on 2025-11-18

Share via

Quickstart: Vector search with .NET in Azure DocumentDB

Prerequisites

App dependencies

Configure and run the app

Configure the app

Authenticate to Azure

Build and run the project

Explore the app code

Explore the search service

Explore the Azure DocumentDB service

View and manage data in Visual Studio Code

Clean up resources

Related content

Feedback

Additional resources