Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Use vector search in Azure Cosmos DB with the Java client library. Store and query vector data efficiently in your applications.
This quickstart uses a sample hotel dataset in a JSON file with vectors from the text-embedding-3-small model. The dataset includes hotel names, locations, descriptions, and vector embeddings.
Find the sample code with resource provisioning on GitHub.
Prerequisites
An Azure subscription
- If you don't have an Azure subscription, create a free account
An existing Azure Cosmos DB resource data plane access
- If you don't have a resource, create a new resource
- Firewall configured to allow access to your client IP address
- Role-based access control (RBAC) roles assigned:
- Cosmos DB Built-in Data Contributor (data plane)
- Role ID:
00000000-0000-0000-0000-000000000002
-
- Custom domain configured
- Role-based access control (RBAC) role assigned:
- Cognitive Services OpenAI User
- Role ID:
5e0bd9bd-7b93-4f28-af87-19fc36ad61bd
text-embedding-3-smallmodel deployed
Create data file with vectors
Create a new data directory for the hotels data file:
mkdir dataDownload the raw data file with vectors to your
datadirectory:curl -o data/HotelsData_toCosmosDB_Vector.json https://raw.githubusercontent.com/Azure-Samples/cosmos-db-vector-samples/refs/heads/main/data/HotelsData_toCosmosDB_Vector.json
Create a Java project
Create a new sibling directory for your project, at the same level as the data directory, and open it in Visual Studio Code:
mkdir vector-search-quickstart cd vector-search-quickstart code .Create a
pom.xmlfile in the project root with the Maven configuration:<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.example.cosmos.samples</groupId> <artifactId>nosql-vector-search-sample</artifactId> <version>1.0.0</version> <name>Azure Cosmos DB NoSQL Vector Search - Java</name> <properties> <maven.compiler.release>21</maven.compiler.release> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <dependencies> <!-- Azure Cosmos DB SDK (NoSQL API) --> <dependency> <groupId>com.azure</groupId> <artifactId>azure-cosmos</artifactId> <version>4.69.0</version> </dependency> <!-- Azure Identity (DefaultAzureCredential) --> <dependency> <groupId>com.azure</groupId> <artifactId>azure-identity</artifactId> <version>1.18.1</version> </dependency> <!-- Azure OpenAI (embeddings) --> <dependency> <groupId>com.azure</groupId> <artifactId>azure-ai-openai</artifactId> <version>1.0.0-beta.16</version> </dependency> <!-- Jackson (JSON serialization) --> <dependency> <groupId>tools.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>3.0.3</version> </dependency> <!-- Suppress noisy SDK logging --> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-nop</artifactId> <version>2.0.17</version> <scope>runtime</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <version>3.5.0</version> <configuration> <mainClass>com.example.cosmos.vectorsearch.VectorSearch</mainClass> </configuration> </plugin> </plugins> </build> </project>Key dependencies:
- azure-identity - Azure authentication library for passwordless (managed identity) connections
- azure-cosmos - Azure Cosmos DB client library for database operations
- azure-ai-openai - Azure OpenAI SDK for generating embeddings
- slf4j-nop - Suppresses noisy SDK logging at runtime
Create the source directory structure:
mkdir -p src/main/java/com/example/cosmos/vectorsearchCreate a
.envfile in your project root for environment variables:# Azure OpenAI Embedding Settings AZURE_OPENAI_EMBEDDING_MODEL=text-embedding-3-small AZURE_OPENAI_EMBEDDING_API_VERSION=2024-08-01-preview AZURE_OPENAI_EMBEDDING_ENDPOINT= # Cosmos DB configuration AZURE_COSMOSDB_ENDPOINT= # Data file DATA_FILE_WITH_VECTORS=../data/HotelsData_toCosmosDB_Vector.json FIELD_TO_EMBED=Description EMBEDDED_FIELD=DescriptionVector EMBEDDING_DIMENSIONS=1536 # Vector search algorithm: diskann or quantizedflat VECTOR_ALGORITHM=diskannReplace the placeholder values in the
.envfile with your own information:AZURE_OPENAI_EMBEDDING_ENDPOINT: Your Azure OpenAI resource endpoint URLAZURE_COSMOSDB_ENDPOINT: Your Azure Cosmos DB endpoint URL
Note
The Java sample uses
System.getenv()to read environment variables. You must export these variables in your shell session or useazd env get-valuesto set them. The.envfile serves as a reference template — it is not loaded automatically by the application.
Understand the document schema
Before building the application, understand how vectors are stored in Azure Cosmos DB documents. Each hotel document contains:
- Standard fields:
HotelId,HotelName,Description,Category, etc. - Vector field:
DescriptionVector- an array of 1536 floating-point numbers representing the semantic meaning of the hotel description
Here's a simplified example of a hotel document structure:
{
"HotelId": "1",
"HotelName": "Stay-Kay City Hotel",
"Description": "This classic hotel is fully-refurbished...",
"Rating": 3.6,
"DescriptionVector": [
-0.04886505,
-0.02030743,
0.01763356,
...
// 1536 dimensions total
]
}
Key points about storing embeddings:
- Vector arrays are stored as standard JSON arrays in your documents
- Vector policy defines the path (
/DescriptionVector), data type (float32), dimensions (1536), and distance function (cosine) - Indexing policy creates a vector index on the vector field for efficient similarity search
- The vector field should be excluded from standard indexing to optimize insertion performance
These policies are defined in the Bicep templates for the distance metrics for this sample project. For more information on vector policies and indexing, see Vector search in Azure Cosmos DB.
Create code files for vector search
Create two Java source files in the src/main/java/com/example/cosmos/vectorsearch directory:
touch src/main/java/com/example/cosmos/vectorsearch/VectorSearch.java
touch src/main/java/com/example/cosmos/vectorsearch/Utils.java
Create code for vector search
Paste the following code into the VectorSearch.java file.
package com.example.cosmos.vectorsearch;
import com.azure.ai.openai.OpenAIClient;
import com.azure.ai.openai.models.EmbeddingItem;
import com.azure.ai.openai.models.EmbeddingsOptions;
import com.azure.cosmos.CosmosClient;
import com.azure.cosmos.CosmosContainer;
import com.azure.cosmos.models.CosmosQueryRequestOptions;
import com.azure.cosmos.models.SqlParameter;
import com.azure.cosmos.models.SqlQuerySpec;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Set;
/**
* Azure Cosmos DB NoSQL vector search sample — Java port of nosql-vector-search-typescript.
*
* Demonstrates:
* - Passwordless authentication with DefaultAzureCredential
* - Bulk insert of hotel data with pre-computed embeddings
* - Vector similarity search using VectorDistance() SQL function
* - DiskANN and QuantizedFlat algorithm selection via environment variable
*/
public final class VectorSearch {
private static final String SAMPLE_QUERY =
"quintessential lodging near running trails, eateries, retail";
private static final Set<String> VALID_ALGORITHMS = Set.of("diskann", "quantizedflat");
private static final Map<String, String> ALGORITHM_CONTAINERS = Map.of(
"diskann", "hotels_diskann",
"quantizedflat", "hotels_quantizedflat"
);
private static final Map<String, String> ALGORITHM_DISPLAY = Map.of(
"diskann", "DiskANN",
"quantizedflat", "QuantizedFlat"
);
public static void main(String[] args) {
try {
new VectorSearch().run();
} catch (Exception e) {
System.err.println("App failed: " + e.getMessage());
e.printStackTrace();
System.exit(1);
}
System.exit(0);
}
private void run() throws Exception {
// ── Configuration ───────────────────────────────────────────────
var algorithm = Utils.envOrDefault("VECTOR_ALGORITHM", "diskann").trim().toLowerCase();
var dbName = Utils.envOrDefault("AZURE_COSMOSDB_DATABASENAME", "Hotels");
var dataFile = Utils.requireEnv("DATA_FILE_WITH_VECTORS");
var embeddedField = Utils.requireEnv("EMBEDDED_FIELD");
var deployment = Utils.requireEnv("AZURE_OPENAI_EMBEDDING_MODEL");
var distanceFunction = Utils.envOrDefault("VECTOR_DISTANCE_FUNCTION", "cosine");
if (!VALID_ALGORITHMS.contains(algorithm)) {
throw new IllegalArgumentException(
"Invalid algorithm '" + algorithm + "'. Must be one of: " +
String.join(", ", VALID_ALGORITHMS));
}
var containerName = ALGORITHM_CONTAINERS.get(algorithm);
var algorithmDisplay = ALGORITHM_DISPLAY.get(algorithm);
// ── Clients ─────────────────────────────────────────────────────
OpenAIClient aiClient = Utils.createOpenAIClient();
CosmosClient dbClient = Utils.createCosmosClient();
try {
var database = dbClient.getDatabase(dbName);
System.out.println("Connected to database: " + dbName);
CosmosContainer container = database.getContainer(containerName);
System.out.println("Connected to container: " + containerName);
System.out.println("\n[Algorithm] Vector Search Algorithm: " + algorithmDisplay);
System.out.println("[Distance] Distance Function: " + distanceFunction);
// Verify container exists
container.read();
// ── Load & Insert Data ──────────────────────────────────────
var dataPath = Path.of(dataFile);
var data = Utils.readJsonFile(dataPath);
Utils.insertData(container, data);
// ── Generate Query Embedding ────────────────────────────────
var embeddingOptions = new EmbeddingsOptions(List.of(SAMPLE_QUERY));
var embeddingResult = aiClient.getEmbeddings(deployment, embeddingOptions);
List<Float> embedding = embeddingResult.getData().get(0).getEmbedding();
// Convert Float list to List<Double> for Cosmos DB parameter binding
var embeddingDoubles = new ArrayList<Double>(embedding.size());
for (var f : embedding) {
embeddingDoubles.add(f.doubleValue());
}
// ── Build & Execute Vector Search Query ─────────────────────
var safeField = Utils.validateFieldName(embeddedField);
var queryText = "SELECT TOP 5 c.HotelName, c.Description, c.Rating, " +
"VectorDistance(c." + safeField + ", @embedding) AS SimilarityScore " +
"FROM c " +
"ORDER BY VectorDistance(c." + safeField + ", @embedding)";
System.out.println("\n--- Executing Vector Search Query ---");
System.out.println("Query: " + queryText);
System.out.println("Parameters: @embedding (vector with " + embeddingDoubles.size() + " dimensions)");
System.out.println("--------------------------------------\n");
var sqlQuery = new SqlQuerySpec(
queryText,
List.of(new SqlParameter("@embedding", embeddingDoubles))
);
var queryOptions = new CosmosQueryRequestOptions();
@SuppressWarnings("unchecked")
var resultPages = container.queryItems(sqlQuery, queryOptions, Map.class);
var results = new ArrayList<Map<String, Object>>();
var requestCharge = 0.0;
for (var page : resultPages.iterableByPage()) {
requestCharge += page.getRequestCharge();
for (var item : page.getResults()) {
@SuppressWarnings("unchecked")
var typedItem = (Map<String, Object>) item;
results.add(typedItem);
}
}
Utils.printSearchResults(results, requestCharge);
} finally {
dbClient.close();
}
}
}
This code:
- Configures either a
DiskANNorquantizedFlatvector algorithm from theVECTOR_ALGORITHMenvironment variable. - Connects to Azure OpenAI and Azure Cosmos DB using passwordless authentication.
- Loads pre-vectorized hotel data from a JSON file.
- Inserts data into the appropriate container.
- Generates an embedding for a natural-language query (
quintessential lodging near running trails, eateries, retail). - Executes a
VectorDistanceSQL query to retrieve the top 5 most semantically similar hotels ranked by similarity score. - Handles errors for missing clients, invalid algorithm selection, and non-existent containers/databases.
Understand the code: Generate embeddings with Azure OpenAI
The code creates embeddings for query text:
EmbeddingsOptions options = new EmbeddingsOptions(
List.of(queryText) // Array of description strings to embed
);
Embeddings embeddings = openAIClient.getEmbeddings(model, options);
List<Float> queryVector = embeddings.getData().get(0).getEmbedding();
This Azure OpenAI SDK call converts text like "quintessential lodging near running trails" into a 1536-dimension vector that captures its semantic meaning. For more details on generating embeddings, see Azure OpenAI embeddings documentation.
Understand the code: Store vectors in Azure Cosmos DB
All documents with vector arrays are inserted at scale using the executeBulkOperations method in Utils.insertData(). Each document is mapped to a bulk create operation using the PartitionKeyBuilder with each document's partition key value. The utility tracks inserted, skipped, and failed counts along with total RU consumption.
This inserts hotel documents including their pre-generated DescriptionVector arrays into the container. You can safely pass in all the document data, and the client library handles the batch processing and retries for you.
Understand the code: Run vector similarity search
The code performs a vector search using the VectorDistance function:
String queryText = String.format(
"SELECT TOP 5 c.HotelName, c.Description, c.Rating, " +
"VectorDistance(c.%s, @embedding) AS SimilarityScore " +
"FROM c ORDER BY VectorDistance(c.%s, @embedding)",
embeddedField, embeddedField
);
SqlQuerySpec querySpec = new SqlQuerySpec(queryText,
new SqlParameter("@embedding", queryVector));
CosmosPagedIterable<ObjectNode> results = container.queryItems(
querySpec, new CosmosQueryRequestOptions(), ObjectNode.class);
This code builds a parameterized SQL query that uses the VectorDistance function to compare the query's embedding vector (@embedding) against each document's stored vector field (DescriptionVector), returning the top 5 hotels with their name and similarity score, ordered from most similar to least similar. The query embedding is passed as a parameter to avoid injection and comes from a prior Azure OpenAI embeddings call.
What this query returns:
- Top 5 most similar hotels based on vector distance
- Hotel properties:
HotelName,Description,Rating SimilarityScore: A numeric value indicating how similar each hotel is to your query- Results ordered from most similar to least similar
For more information on the VectorDistance function, see VectorDistance documentation.
Create utility functions
Paste the following code into Utils.java:
package com.example.cosmos.vectorsearch;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.azure.ai.openai.OpenAIClient;
import com.azure.cosmos.CosmosClient;
import com.azure.cosmos.CosmosClientBuilder;
import com.azure.cosmos.CosmosContainer;
import com.azure.cosmos.models.CosmosBulkOperations;
import com.azure.cosmos.models.CosmosItemOperation;
import com.azure.cosmos.models.PartitionKey;
import com.azure.cosmos.models.PartitionKeyBuilder;
import com.azure.identity.DefaultAzureCredentialBuilder;
import tools.jackson.databind.json.JsonMapper;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
/**
* Shared utilities for Azure Cosmos DB NoSQL vector search sample.
* Provides authentication, bulk insert, field validation, and result formatting.
*/
public final class Utils {
private static final JsonMapper JSON_MAPPER = JsonMapper.builder().build();
private Utils() {
// utility class
}
// ── Authentication ──────────────────────────────────────────────────
/**
* Create an Azure OpenAI client using DefaultAzureCredential (passwordless).
*/
public static OpenAIClient createOpenAIClient() {
var endpoint = requireEnv("AZURE_OPENAI_EMBEDDING_ENDPOINT");
var credential = new DefaultAzureCredentialBuilder().build();
return new OpenAIClientBuilder()
.endpoint(endpoint)
.credential(credential)
.buildClient();
}
/**
* Create a Cosmos DB client using DefaultAzureCredential (passwordless).
*/
public static CosmosClient createCosmosClient() {
var endpoint = requireEnv("AZURE_COSMOSDB_ENDPOINT");
var credential = new DefaultAzureCredentialBuilder().build();
return new CosmosClientBuilder()
.endpoint(endpoint)
.credential(credential)
.buildClient();
}
// ── Data Loading ────────────────────────────────────────────────────
/**
* Read a JSON array file and return its contents as a list of maps.
*/
@SuppressWarnings("unchecked")
public static List<Map<String, Object>> readJsonFile(Path filePath) throws IOException {
System.out.println("Reading JSON file from " + filePath);
var bytes = Files.readAllBytes(filePath);
return JSON_MAPPER.readValue(bytes, List.class);
}
// ── Bulk Insert ─────────────────────────────────────────────────────
/**
* Insert documents into a Cosmos DB container using bulk operations.
* Skips insert if the container already contains data.
*
* @return summary with counts and RU charge
*/
public static BulkInsertResult insertData(CosmosContainer container,
List<Map<String, Object>> data) {
// Check existing document count
var existingCount = getDocumentCount(container);
if (existingCount > 0) {
System.out.println("Container already has " + existingCount + " documents. Skipping insert.");
return new BulkInsertResult(0, 0, 0, (int) existingCount, 0.0);
}
System.out.println("Inserting " + data.size() + " items using bulk operations...");
// Build bulk create operations
var operations = new ArrayList<CosmosItemOperation>();
for (var item : data) {
// Cosmos DB requires an "id" field — map HotelId to id
var doc = new java.util.HashMap<>(item);
var hotelId = String.valueOf(doc.get("HotelId"));
doc.put("id", hotelId);
operations.add(CosmosBulkOperations.getCreateItemOperation(doc,
new PartitionKeyBuilder().add(hotelId).build()));
}
var inserted = 0;
var failed = 0;
var skipped = 0;
var totalRUs = 0.0;
var startTime = System.currentTimeMillis();
System.out.println("Starting bulk insert (" + operations.size() + " items)...");
var responses = container.executeBulkOperations(operations);
for (var response : responses) {
var statusCode = response.getResponse().getStatusCode();
totalRUs += response.getResponse().getRequestCharge();
if (statusCode >= 200 && statusCode < 300) {
inserted++;
} else if (statusCode == 409) {
skipped++;
} else {
failed++;
}
}
var durationSec = (System.currentTimeMillis() - startTime) / 1000.0;
System.out.printf("Bulk insert completed in %.2fs%n", durationSec);
System.out.printf("%nInsert Request Charge: %.2f RUs%n%n", totalRUs);
return new BulkInsertResult(data.size(), inserted, failed, skipped, totalRUs);
}
private static long getDocumentCount(CosmosContainer container) {
var result = container.queryItems(
"SELECT VALUE COUNT(1) FROM c",
new com.azure.cosmos.models.CosmosQueryRequestOptions(),
Long.class
);
for (var count : result) {
return count;
}
return 0;
}
// ── Field Name Validation ───────────────────────────────────────────
/**
* Validates a field name to prevent NoSQL injection when building queries
* with string interpolation.
*
* @param fieldName the field name to validate
* @return the validated field name
* @throws IllegalArgumentException if the field name contains unsafe characters
*/
public static String validateFieldName(String fieldName) {
if (!fieldName.matches("^[A-Za-z_][A-Za-z0-9_]*$")) {
throw new IllegalArgumentException(
"Invalid field name: \"" + fieldName + "\". " +
"Field names must start with a letter or underscore " +
"and contain only letters, numbers, and underscores.");
}
return fieldName;
}
// ── Output Formatting ───────────────────────────────────────────────
/**
* Print search results in a consistent tabular format.
*/
public static void printSearchResults(List<Map<String, Object>> results, double requestCharge) {
System.out.println("\n--- Search Results ---");
if (results == null || results.isEmpty()) {
System.out.println("No results found.");
return;
}
for (var i = 0; i < results.size(); i++) {
var r = results.get(i);
var name = r.get("HotelName");
var score = r.get("SimilarityScore");
System.out.printf("%d. %s, Score: %.4f%n", i + 1, name, ((Number) score).doubleValue());
}
System.out.printf("%nVector Search Request Charge: %.2f RUs%n%n", requestCharge);
}
// ── Environment Helpers ─────────────────────────────────────────────
public static String requireEnv(String key) {
var value = System.getenv(key);
if (value == null || value.isBlank()) {
throw new IllegalStateException("Required environment variable not set: " + key);
}
return value;
}
public static String envOrDefault(String key, String defaultValue) {
var value = System.getenv(key);
return (value != null && !value.isBlank()) ? value : defaultValue;
}
// ── Result Record ───────────────────────────────────────────────────
public record BulkInsertResult(int total, int inserted, int failed, int skipped, double requestCharge) {}
}
This utility class provides these key functions:
createOpenAIClient/createCosmosClient: Create clients for Azure OpenAI and Azure Cosmos DB using passwordless authentication via DefaultAzureCredential. Enable RBAC on both resources and sign in to Azure CLIinsertData: Inserts data in batches into an Azure Cosmos DB container using bulk operations and tracks inserted, skipped, and failed counts along with total RU consumptionprintSearchResults: Prints the results of a vector search, including the score and hotel namevalidateFieldName: Validates that a field name exists in the data to prevent injection
Authenticate with Azure CLI
Sign in to Azure CLI before you run the application so the app can access Azure resources securely.
az login
The code uses your local developer authentication to access Azure Cosmos DB and Azure OpenAI with createOpenAIClient and createCosmosClient from Utils.java. These functions rely on DefaultAzureCredential from azure-identity, which walks an ordered chain of credential providers and resolves to Azure CLI credentials for local development. Learn more about how to Authenticate Java apps to Azure services using the Azure Identity library.
Build and run the application
Build and run the application with Maven:
Linux/macOS:
VECTOR_ALGORITHM=diskann mvn compile exec:java
Windows:
set VECTOR_ALGORITHM=diskann && mvn compile exec:java
The app logging and output show:
- Data insertion status
- Vector index creation
- Search results with hotel names and similarity scores
Connected to database: Hotels
Connected to container: hotels_diskann
📊 Vector Search Algorithm: DiskANN
📏 Distance Function: cosine
Reading JSON file from ../data/HotelsData_toCosmosDB_Vector.json
Inserting 50 items using bulk operations...
Starting bulk insert (50 items)...
Bulk insert completed in 3.41s
Insert Request Charge: 6805.25 RUs
--- Executing Vector Search Query ---
Query: SELECT TOP 5 c.HotelName, c.Description, c.Rating, VectorDistance(c.DescriptionVector, @embedding) AS SimilarityScore FROM c ORDER BY VectorDistance(c.DescriptionVector, @embedding)
Parameters: @embedding (vector with 1536 dimensions)
--------------------------------------
--- Search Results ---
1. Royal Cottage Resort, Score: 0.4991
2. Country Comfort Inn, Score: 0.4786
3. Nordick's Valley Motel, Score: 0.4635
4. Economy Universe Motel, Score: 0.4461
5. Roach Motel, Score: 0.4388
Vector Search Request Charge: 5.33 RUs
Distance metrics
Azure Cosmos DB supports three distance functions for vector similarity:
| Distance Function | Score Range | Interpretation | Best For |
|---|---|---|---|
| Cosine (default) | 0.0 to 1.0 | Higher scores (closer to 1.0) indicate greater similarity | General text similarity, Azure OpenAI embeddings (used in this quickstart) |
| Euclidean (L2) | 0.0 to ∞ | Lower = more similar | Spatial data, when magnitude matters |
| Dot Product | -∞ to +∞ | Higher = more similar | When vector magnitudes are normalized |
The distance function is set in the vector embedding policy when creating the container. This is provided in the infrastructure in the sample repository. It is defined as part of the container definition.
{
name: 'hotels_diskann'
partitionKeyPaths: [
'/HotelId'
]
indexingPolicy: {
indexingMode: 'consistent'
automatic: true
includedPaths: [
{
path: '/*'
}
]
excludedPaths: [
{
path: '/_etag/?'
}
{
path: '/DescriptionVector/*'
}
]
vectorIndexes: [
{
path: '/DescriptionVector'
type: 'diskANN'
}
]
}
vectorEmbeddingPolicy: {
vectorEmbeddings: [
{
path: '/DescriptionVector'
dataType: 'float32'
dimensions: 1536
distanceFunction: 'cosine'
}
]
}
}
This Bicep code defines an Azure Cosmos DB container configuration for storing hotel documents with vector search capabilities.
| Property | Description |
|---|---|
partitionKeyPaths |
Partitions documents by HotelId for distributed storage. |
indexingPolicy |
Configures automatic indexing on all document properties (/*) except the system _etag field and the DescriptionVector array to optimize write performance. Vector fields don't need standard indexing because they use a specialized vectorIndexes configuration instead. |
vectorIndexes |
Creates either a DiskANN or quantizedFlat index on the /DescriptionVector path for efficient similarity searches. |
vectorEmbeddingPolicy |
Defines the vector field's characteristics: float32 data type with 1536 dimensions (matching the text-embedding-3-small model output) and cosine as the distance function to measure similarity between vectors during queries. |
Interpret similarity scores
In the example output using cosine similarity:
- 0.4991 (Royal Cottage Resort) - Highest similarity, best match for "lodging near running trails, eateries, retail"
- 0.4388 (Roach Motel) - Lower similarity, still relevant but less matching
- Scores closer to 1.0 indicate stronger semantic similarity
- Scores near 0 indicate little similarity
Important
- Absolute score values depend on your embedding model and data
- Focus on relative ranking rather than absolute thresholds
- Azure OpenAI embeddings work best with cosine similarity
For detailed information on distance functions, see What are distance functions?
View and manage data in Visual Studio Code
Select the Cosmos DB extension in Visual Studio Code to connect to your Azure Cosmos DB account.
View the data and indexes in the Hotels database.
Clean up resources
When you no longer need the API for NoSQL account, you can delete the corresponding resource group.