Catatan
Akses ke halaman ini memerlukan otorisasi. Anda dapat mencoba masuk atau mengubah direktori.
Akses ke halaman ini memerlukan otorisasi. Anda dapat mencoba mengubah direktori.
Pelajari cara menggunakan pencarian vektor di Azure DocumentDB dengan driver .NET MongoDB untuk menyimpan dan mengkueri data vektor secara efisien.
Panduan untuk memulai cepat ini menyediakan tur terpandu mengenai teknik-teknik kunci pencarian vektor menggunakan aplikasi sampel .NET di GitHub.
Aplikasi ini menggunakan sampel himpunan data hotel dalam file JSON dengan vektor yang telah dihitung sebelumnya dari text-embedding-ada-002 model, meskipun Anda juga dapat membuat vektor sendiri. Data hotel mencakup nama hotel, lokasi, deskripsi, dan penyematan vektor.
Prasyarat
Langganan Azure
- Jika Anda tidak memiliki langganan Azure, buat akun gratis
Kluster Azure DocumentDB yang sudah ada
- Jika Anda tidak memiliki kluster, buat kluster baru
Firewall dikonfigurasi untuk mengizinkan akses ke alamat IP klien Anda
-
text-embedding-ada-002model disebarkan
Gunakan lingkungan Bash di Azure Cloud Shell. Untuk informasi selengkapnya, lihat Mulai menggunakan Azure Cloud Shell.
Jika Anda lebih suka menjalankan perintah referensi CLI secara lokal, instal Azure CLI. Jika Anda menjalankan Windows atau macOS, pertimbangkan untuk menjalankan Azure CLI dalam kontainer Docker. Untuk informasi lebih lanjut, lihat Cara menjalankan Azure CLI di kontainer Docker.
Jika Anda menggunakan instalasi lokal, masuk ke Azure CLI dengan menggunakan perintah az login. Untuk menyelesaikan proses autentikasi, ikuti langkah-langkah yang ditampilkan di terminal Anda. Untuk opsi masuk lainnya, lihat Mengautentikasi ke Azure menggunakan Azure CLI.
Saat diminta, instal ekstensi Azure CLI saat pertama kali digunakan. Untuk informasi selengkapnya tentang ekstensi, lihat Menggunakan dan mengelola ekstensi dengan Azure CLI.
Jalankan az version untuk menemukan versi dan pustaka dependen yang terinstal. Untuk meng-upgrade ke versi terbaru, jalankan az upgrade.
.NET 8.0 SDK atau yang lebih baru
Dependensi aplikasi
Aplikasi ini menggunakan paket NuGet berikut:
-
Azure.Identity: Pustaka Identitas Azure untuk autentikasi tanpa kata sandi dengan ID Microsoft Entra -
Azure.AI.OpenAI: Pustaka klien Azure OpenAI untuk berkomunikasi dengan model AI dan membuat penyematan vektor -
Microsoft.Extensions.Configuration: Manajemen konfigurasi untuk pengaturan aplikasi -
MongoDB.Driver: Driver Resmi MongoDB .NET untuk konektivitas dan operasi database -
Newtonsoft.Json: Pustaka serialisasi dan deserialisasi JSON populer
Mengonfigurasi dan menjalankan aplikasi
Selesaikan langkah-langkah berikut untuk mengonfigurasi aplikasi dengan nilai Anda sendiri dan menjalankan pencarian terhadap kluster Azure DocumentDB Anda.
Mengonfigurasi aplikasi
Perbarui nilai placeholder appsettings.json dengan nilai Anda sendiri.
{
"AzureOpenAI": {
"TenantId": "<your-tenant-id>",
"EmbeddingModel": "text-embedding-3-small",
"ApiVersion": "2023-05-15",
"Endpoint": "https://<your-openai-service-name>.openai.azure.com/"
},
"DataFiles": {
"WithoutVectors": "data/Hotels.json",
"WithVectors": "data/Hotels_Vector.json"
},
"Embedding": {
"FieldToEmbed": "Description",
"EmbeddedField": "DescriptionVector",
"Dimensions": 1536,
"BatchSize": 16
},
"MongoDB": {
"TenantId": "<your-tenant-id>",
"ClusterName": "<your-cluster-name>",
"LoadBatchSize": 100
},
"VectorSearch": {
"Query": "quintessential lodging near running trails, eateries, retail",
"DatabaseName": "Hotels",
"TopK": 5
}
}
Mengautentikasi ke Azure
Aplikasi sampel menggunakan autentikasi tanpa kata sandi melalui DefaultAzureCredential dan ID Microsoft Entra.
Masuk ke Azure menggunakan alat yang didukung seperti Azure CLI atau Azure PowerShell sebelum Anda menjalankan aplikasi sehingga dapat mengakses sumber daya Azure dengan aman.
Nota
Pastikan identitas masuk Anda memiliki peran sarana data yang diperlukan di akun Azure DocumentDB dan sumber daya Azure OpenAI.
az login
Membangun dan menjalankan proyek
Aplikasi sampel mengisi data sampel vektorisasi dalam koleksi MongoDB dan memungkinkan Anda menjalankan berbagai jenis kueri pencarian.
dotnet runGunakan perintah untuk memulai aplikasi:dotnet runAplikasi mencetak menu bagi Anda untuk memilih database dan opsi pencarian:
=== DocumentDB Vector Samples Menu === Please enter your choice (0-5): 1. Create embeddings for data 2. Show all database indexes 3. Run IVF vector search 4. Run HNSW vector search 5. Run DiskANN vector search 0. ExitKetik
5dan tekan enter.Setelah aplikasi mengisi database dan menjalankan pencarian, Anda akan melihat lima hotel teratas yang cocok dengan kueri pencarian vektor yang dipilih dan skor kesamaannya.
Pengelogan dan output aplikasi menunjukkan:
- Pembuatan koleksi dan status penyisipan data
- Konfirmasi pembuatan indeks vektor
- Hasil pencarian dengan nama hotel, lokasi, dan skor kesamaan
Output contoh (dipersingkat untuk kejelasan)
MongoDB client initialized with passwordless authentication Starting DiskANN vector search workflow Collection is empty, loading data from file Successfully loaded 50 documents into collection Creating vector index 'vectorIndex_diskann' Vector index 'vectorIndex_diskann' is ready for DiskANN search Executing DiskANN vector search for top 5 results Search Results (5 found using DiskANN): 1. Royal Cottage Resort (Similarity: 0.4991) 2. Country Comfort Inn (Similarity: 0.4786) 3. Nordick's Valley Motel (Similarity: 0.4635) 4. Economy Universe Motel (Similarity: 0.4461) 5. Roach Motel (Similarity: 0.4388)
Menjelajahi kode aplikasi
Bagian berikut memberikan detail tentang layanan dan kode terpenting di aplikasi sampel. Kunjungi repositori GitHub untuk menjelajahi kode aplikasi lengkap.
Menjelajahi layanan pencarian
Ini VectorSearchService mengatur pencarian kesamaan vektor end-to-end menggunakan teknik pencarian IVF, HNSW, dan DiskANN dengan penyematan Azure OpenAI.
using Azure.AI.OpenAI;
using Azure.Identity;
using DocumentDBVectorSamples.Models;
using Microsoft.Extensions.Logging;
using MongoDB.Bson;
using MongoDB.Driver;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System.Reflection;
namespace DocumentDBVectorSamples.Services.VectorSearch;
/// <summary>
/// Service for performing vector similarity searches using different algorithms (IVF, HNSW, DiskANN).
/// Handles data loading, vector index creation, query embedding generation, and search execution.
/// </summary>
public class VectorSearchService
{
private readonly ILogger<VectorSearchService> _logger;
private readonly AzureOpenAIClient _openAIClient;
private readonly MongoDbService _mongoService;
private readonly AppConfiguration _config;
public VectorSearchService(ILogger<VectorSearchService> logger, MongoDbService mongoService, AppConfiguration config)
{
_logger = logger;
_mongoService = mongoService;
_config = config;
// Initialize Azure OpenAI client with passwordless authentication
var credentialOptions = new DefaultAzureCredentialOptions();
if (!string.IsNullOrEmpty(_config.AzureOpenAI.TenantId))
{
credentialOptions.TenantId = _config.AzureOpenAI.TenantId;
}
var credential = new DefaultAzureCredential(credentialOptions);
_openAIClient = new AzureOpenAIClient(new Uri(_config.AzureOpenAI.Endpoint), credential);
}
/// <summary>
/// Executes a complete vector search workflow: data setup, index creation, query embedding, and search
/// </summary>
/// <param name="indexType">The vector search algorithm to use (IVF, HNSW, or DiskANN)</param>
public async Task RunSearchAsync(VectorIndexType indexType)
{
try
{
_logger.LogInformation($"Starting {indexType} vector search workflow");
// Setup collection
var collectionSuffix = indexType switch
{
VectorIndexType.IVF => "ivf",
VectorIndexType.HNSW => "hnsw",
VectorIndexType.DiskANN => "diskann",
_ => throw new ArgumentException($"Unknown index type: {indexType}")
};
var collectionName = $"hotels_{collectionSuffix}";
var indexName = $"vectorIndex_{collectionSuffix}";
var collection = _mongoService.GetCollection<HotelData>(_config.VectorSearch.DatabaseName, collectionName);
// Load data from file if collection is empty
var assemblyLocation = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location) ?? string.Empty;
var dataFilePath = Path.Combine(assemblyLocation, _config.DataFiles.WithVectors);
await _mongoService.LoadDataIfNeededAsync(collection, dataFilePath);
// Create the vector index with algorithm-specific search options
var searchOptions = indexType switch
{
VectorIndexType.IVF => CreateIVFSearchOptions(_config.Embedding.Dimensions),
VectorIndexType.HNSW => CreateHNSWSearchOptions(_config.Embedding.Dimensions),
VectorIndexType.DiskANN => CreateDiskANNSearchOptions(_config.Embedding.Dimensions),
_ => throw new ArgumentException($"Unknown index type: {indexType}")
};
await _mongoService.CreateVectorIndexAsync(
_config.VectorSearch.DatabaseName, collectionName, indexName,
_config.Embedding.EmbeddedField, searchOptions);
_logger.LogInformation($"Vector index '{indexName}' is ready for {indexType} search");
await Task.Delay(5000); // Allow index to be fully initialized
// Create embedding for the query
var embeddingClient = _openAIClient.GetEmbeddingClient(_config.AzureOpenAI.EmbeddingModel);
var queryEmbedding = (await embeddingClient.GenerateEmbeddingAsync(_config.VectorSearch.Query)).Value.ToFloats().ToArray();
_logger.LogInformation($"Generated query embedding with {queryEmbedding.Length} dimensions");
// Build MongoDB aggregation pipeline for vector search
var searchPipeline = new BsonDocument[]
{
// Vector similarity search using cosmosSearch
new BsonDocument("$search", new BsonDocument
{
["cosmosSearch"] = new BsonDocument
{
["vector"] = new BsonArray(queryEmbedding.Select(f => new BsonDouble(f))),
["path"] = _config.Embedding.EmbeddedField, // Field containing embeddings
["k"] = _config.VectorSearch.TopK // Number of results to return
}
}),
// Project results with similarity scores
new BsonDocument("$project", new BsonDocument
{
["score"] = new BsonDocument("$meta", "searchScore"),
["document"] = "$$ROOT"
})
};
// Execute and process the search
_logger.LogInformation($"Executing {indexType} vector search for top {_config.VectorSearch.TopK} results");
var searchResults = (await collection.AggregateAsync<BsonDocument>(searchPipeline)).ToList()
.Select(result => new SearchResult
{
Document = MongoDB.Bson.Serialization.BsonSerializer.Deserialize<HotelData>(result["document"].AsBsonDocument),
Score = result["score"].AsDouble
}).ToList();
// Print the results
if (searchResults?.Count == 0)
{
_logger.LogInformation("ā No search results found. Check query terms and data availability.");
}
else
{
_logger.LogInformation($"\nā
Search Results ({searchResults!.Count} found using {indexType}):");
for (int i = 0; i < searchResults.Count; i++)
{
var result = searchResults[i];
var hotelName = result.Document?.HotelName ?? "Unknown Hotel";
_logger.LogInformation($" {i + 1}. {hotelName} (Similarity: {result.Score:F4})");
}
}
}
catch (Exception ex)
{
_logger.LogError(ex, $"{indexType} vector search failed");
throw;
}
}
/// <summary>
/// Creates IVF (Inverted File) search options - good for large datasets with fast approximate search
/// </summary>
private BsonDocument CreateIVFSearchOptions(int dimensions) => new BsonDocument
{
["kind"] = "vector-ivf",
["similarity"] = "COS",
["dimensions"] = dimensions,
["numLists"] = 1
};
/// <summary>
/// Creates HNSW (Hierarchical Navigable Small World) search options - best accuracy/speed balance
/// </summary>
private BsonDocument CreateHNSWSearchOptions(int dimensions) => new BsonDocument
{
["kind"] = "vector-hnsw",
["similarity"] = "COS",
["dimensions"] = dimensions,
["m"] = 16,
["efConstruction"] = 64
};
/// <summary>
/// Creates DiskANN search options - optimized for very large datasets stored on disk
/// </summary>
private BsonDocument CreateDiskANNSearchOptions(int dimensions) => new BsonDocument
{
["kind"] = "vector-diskann",
["similarity"] = "COS",
["dimensions"] = dimensions
};
}
Dalam kode sebelumnya, VectorSearchService melakukan tugas berikut:
- Menentukan nama koleksi dan indeks berdasarkan algoritma yang diminta
- Membuat atau mendapatkan koleksi MongoDB dan memuat data JSON jika kosong
- Membangun opsi indeks khusus algoritma (IVF / HNSW / DiskANN) dan memastikan indeks vektor ada
- Menghasilkan embedding untuk kueri yang dikonfigurasi melalui Azure OpenAI
- Membangun dan menjalankan alur pencarian agregasi
- Mendeserialisasi dan mencetak hasil
Menjelajahi layanan Azure DocumentDB
MongoDbService mengelola interaksi dengan Azure DocumentDB untuk menangani tugas seperti memuat data, pembuatan indeks vektor, pendaftaran indeks, dan sisipan massal untuk pencarian vektor hotel.
using Azure.Identity;
using DocumentDBVectorSamples.Models;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.Logging;
using MongoDB.Bson;
using MongoDB.Driver;
using Newtonsoft.Json;
namespace DocumentDBVectorSamples.Services;
/// <summary>
/// Service for MongoDB operations including data insertion, index management, and vector index creation.
/// Supports Azure DocumentDB with passwordless authentication.
/// </summary>
public class MongoDbService
{
private readonly ILogger<MongoDbService> _logger;
private readonly AppConfiguration _config;
private readonly MongoClient _client;
public MongoDbService(ILogger<MongoDbService> logger, IConfiguration configuration)
{
_logger = logger;
_config = new AppConfiguration();
configuration.Bind(_config);
// Validate configuration
if (string.IsNullOrEmpty(_config.MongoDB.ClusterName))
throw new InvalidOperationException("MongoDB connection not configured. Please provide ConnectionString or ClusterName.");
// Configure MongoDB connection for Azure DocumentDB with OIDC authentication
var connectionString = $"mongodb+srv://{_config.MongoDB.ClusterName}.global.mongocluster.cosmos.azure.com/?tls=true&authMechanism=MONGODB-OIDC&retrywrites=false&maxIdleTimeMS=120000";
var settings = MongoClientSettings.FromUrl(MongoUrl.Create(connectionString));
settings.UseTls = true;
settings.RetryWrites = false;
settings.MaxConnectionIdleTime = TimeSpan.FromMinutes(2);
settings.Credential = MongoCredential.CreateOidcCredential(new AzureIdentityTokenHandler(new DefaultAzureCredential(), _config.MongoDB.TenantId));
settings.Freeze();
_client = new MongoClient(settings);
_logger.LogInformation("MongoDB client initialized with passwordless authentication");
}
/// <summary>Gets a database instance by name</summary>
public IMongoDatabase GetDatabase(string databaseName) => _client.GetDatabase(databaseName);
/// <summary>Gets a collection instance from the specified database</summary>
public IMongoCollection<T> GetCollection<T>(string databaseName, string collectionName) =>
_client.GetDatabase(databaseName).GetCollection<T>(collectionName);
/// <summary>Drops a collection from the specified database</summary>
public async Task DropCollectionAsync(string databaseName, string collectionName)
{
_logger.LogInformation($"Dropping collection '{collectionName}' from database '{databaseName}'");
await _client.GetDatabase(databaseName).DropCollectionAsync(collectionName);
_logger.LogInformation($"Collection '{collectionName}' dropped successfully");
}
/// <summary>
/// Creates a vector search index for DocumentDB, with support for IVF, HNSW, and DiskANN algorithms
/// </summary>
public async Task<BsonDocument> CreateVectorIndexAsync(string databaseName, string collectionName, string indexName, string embeddedField, BsonDocument cosmosSearchOptions)
{
var database = _client.GetDatabase(databaseName);
var collection = database.GetCollection<BsonDocument>(collectionName);
// Check if index already exists to avoid duplication
var indexList = await (await collection.Indexes.ListAsync()).ToListAsync();
if (indexList.Any(index => index.TryGetValue("name", out var nameValue) && nameValue.AsString == indexName))
{
_logger.LogInformation($"Vector index '{indexName}' already exists, skipping creation");
return new BsonDocument { ["ok"] = 1 };
}
// Create the specified vector index type
_logger.LogInformation($"Creating vector index '{indexName}' on field '{embeddedField}'");
return await database.RunCommandAsync<BsonDocument>(new BsonDocument
{
["createIndexes"] = collectionName,
["indexes"] = new BsonArray
{
new BsonDocument
{
["name"] = indexName,
["key"] = new BsonDocument { [embeddedField] = "cosmosSearch" },
["cosmosSearchOptions"] = cosmosSearchOptions
}
}
});
}
/// <summary>
/// Displays all indexes across all user databases, excluding system databases
/// </summary>
public async Task ShowAllIndexesAsync()
{
try
{
// Get user databases (exclude system databases)
var databases = (await (await _client.ListDatabaseNamesAsync()).ToListAsync())
.Where(name => !new[] { "admin", "config", "local" }.Contains(name)).ToList();
if (!databases.Any())
{
_logger.LogInformation("No user databases found or access denied");
return;
}
foreach (var dbName in databases)
{
var database = _client.GetDatabase(dbName);
var collections = await (await database.ListCollectionNamesAsync()).ToListAsync();
if (!collections.Any())
{
_logger.LogInformation($"Database '{dbName}': No collections found");
continue;
}
_logger.LogInformation($"\nš DATABASE: {dbName} ({collections.Count} collections)");
// Display indexes for each collection
foreach (var collName in collections)
{
try
{
var indexList = await (await database.GetCollection<BsonDocument>(collName).Indexes.ListAsync()).ToListAsync();
_logger.LogInformation($"\n šļø COLLECTION: {collName} ({indexList.Count} indexes)");
indexList.ForEach(index => _logger.LogInformation($" Index: {index.ToJson()}"));
}
catch (Exception ex)
{
_logger.LogError(ex, $"Failed to list indexes for collection '{collName}'");
}
}
}
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to retrieve database indexes");
throw;
}
}
/// <summary>
/// Loads data from file into collection if the collection is empty
/// </summary>
/// <param name="collection">Target collection to load data into</param>
/// <param name="dataFilePath">Path to the JSON data file containing vector embeddings</param>
/// <returns>Number of documents loaded, or 0 if collection already had data</returns>
public async Task<int> LoadDataIfNeededAsync<T>(IMongoCollection<T> collection, string dataFilePath) where T : class
{
var existingDocCount = await collection.CountDocumentsAsync(Builders<T>.Filter.Empty);
// Skip loading if collection already has data
if (existingDocCount > 0)
{
_logger.LogInformation("Collection already contains data, skipping load operation");
return 0;
}
// Load and validate data file
_logger.LogInformation("Collection is empty, loading data from file");
if (!File.Exists(dataFilePath))
throw new FileNotFoundException($"Vector data file not found: {dataFilePath}");
var jsonContent = await File.ReadAllTextAsync(dataFilePath);
var data = JsonConvert.DeserializeObject<List<T>>(jsonContent) ?? new List<T>();
if (data.Count == 0)
throw new InvalidOperationException("No data found in the vector data file");
// Insert data using existing method
var summary = await InsertDataAsync(collection, data);
_logger.LogInformation($"Successfully loaded {summary.Inserted} documents into collection");
return summary.Inserted;
}
/// <summary>
/// Inserts data into MongoDB collection, converts JSON embeddings to float arrays, and creates standard indexes
/// </summary>
public async Task<InsertSummary> InsertDataAsync<T>(IMongoCollection<T> collection, IEnumerable<T> data)
{
var dataList = data.ToList();
_logger.LogInformation($"Processing {dataList.Count} items for insertion");
// Convert JSON array embeddings to float arrays for vector search compatibility
foreach (var hotel in dataList.OfType<HotelData>().Where(h => h.ExtraElements != null))
foreach (var kvp in hotel.ExtraElements.ToList().Where(k => k.Value is Newtonsoft.Json.Linq.JArray))
hotel.ExtraElements[kvp.Key] = ((Newtonsoft.Json.Linq.JArray)kvp.Value).Select(token => (float)token).ToArray();
int inserted = 0, failed = 0;
try
{
// Use unordered insert for better performance
await collection.InsertManyAsync(dataList, new InsertManyOptions { IsOrdered = false });
inserted = dataList.Count;
_logger.LogInformation($"Successfully inserted {inserted} items");
}
catch (Exception ex)
{
failed = dataList.Count;
_logger.LogError(ex, $"Batch insert failed for {dataList.Count} items");
}
// Create standard indexes for common query fields
var indexFields = new[] { "HotelId", "Category", "Description", "Description_fr" };
foreach (var field in indexFields)
await collection.Indexes.CreateOneAsync(new CreateIndexModel<T>(Builders<T>.IndexKeys.Ascending(field)));
return new InsertSummary { Total = dataList.Count, Inserted = inserted, Failed = failed };
}
/// <summary>Disposes the MongoDB client and its resources</summary>
public void Dispose() => _client?.Cluster?.Dispose();
}
Dalam kode sebelumnya, MongoDbService melakukan tugas berikut:
- Membaca konfigurasi dan membangun klien tanpa kata sandi dengan kredensial Azure
- Menyediakan referensi database atau koleksi sesuai permintaan
- Membuat indeks pencarian vektor hanya jika belum ada
- Mencantumkan semua database non-sistem, koleksinya, dan indeks setiap koleksi
- Menyisipkan data sampel jika koleksi kosong dan menambahkan indeks pendukung
Menampilkan dan mengelola data di Visual Studio Code
Instal ekstensi DocumentDB dan ekstensi C# di Visual Studio Code.
Sambungkan ke akun Azure DocumentDB Anda menggunakan ekstensi DocumentDB.
Lihat data dan indeks di database Hotel.
Membersihkan sumber daya
Hapus grup sumber daya, kluster Azure DocumentDB, dan sumber daya Azure OpenAI saat Anda tidak lagi membutuhkannya untuk menghindari biaya yang tidak perlu.