DocumentDB Vector Search for Java

02/12/2026

This project demonstrates vector search capabilities using Azure DocumentDB with Java. It includes implementations of three different vector index types: DiskANN, HNSW, and IVF.

Overview

Vector search enables semantic similarity searching by converting text into high-dimensional vector representations (embeddings) and finding the most similar vectors in the database. This project shows how to:

Generate embeddings using Azure OpenAI
Store vectors in DocumentDB
Create and use different types of vector indexes
Perform similarity searches with various algorithms

Prerequisites

Before running this project, you need:

Azure Resources

Azure subscription with appropriate permissions
Azure Developer CLI (azd) installed

Development Environment

Java 21 or higher
Maven 3.6 or higher
Git (for cloning the repository)
Visual Studio Code (recommended) or another Java IDE

Setup Instructions

Clone and Setup Project

# Clone this repository
git clone https://github.com/Azure-Samples/documentdb-samples

Deploy Azure Resources

This project uses Azure Developer CLI (azd) to deploy all required Azure resources from the existing infrastructure-as-code files.

Install Azure Developer CLI

If you haven't already, install the Azure Developer CLI:

Windows:

winget install microsoft.azd

macOS:

brew tap azure/azd && brew install azd

Linux:

curl -fsSL https://aka.ms/install-azd.sh | bash

Deploy Resources

Navigate to the root of the repository and run:

# Login to Azure
azd auth login

# Provision Azure resources
azd up

During provisioning, you'll be prompted for:

Environment name: A unique name for your deployment (e.g., "my-vector-search")
Azure subscription: Select your Azure subscription
Location: Choose from eastus2 or swedencentral (required for OpenAI models)

The azd up command will:

Create a resource group
Deploy Azure OpenAI with text-embedding-3-small model
Deploy Azure DocumentDB (MongoDB vCore) cluster
Create a managed identity for secure access
Configure all necessary permissions and networking
Generate a .env file with all connection information at the repository root

Compile the Project

# Move to Java vector search project
cd ai/vector-search-java

# Compile the project
mvn clean compile

Load Environment Variables

After deployment completes, load the environment variables from the generated .env file. The set -a command ensures variables are exported to child processes (like the Maven JVM):

# From the ai/vector-search-java directory
set -a && source ../../.env && set +a

You can verify the environment variables are set:

echo $MONGO_CLUSTER_NAME

Usage

The project includes several Java classes that demonstrate different aspects of vector search.

az login

DiskANN Vector Search

Run DiskANN (Disk-based Approximate Nearest Neighbor) search:

mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.DiskAnn"

DiskANN is optimized for:

Large datasets that don't fit in memory
Efficient disk-based storage
Good balance of speed and accuracy

HNSW Vector Search

Run HNSW (Hierarchical Navigable Small World) search:

mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.HNSW"

HNSW provides:

Excellent search performance
High recall rates
Hierarchical graph structure
Good for real-time applications

IVF Vector Search

Run IVF (Inverted File) search:

mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.IVF"

IVF features:

Clusters vectors by similarity
Fast search through cluster centroids
Configurable accuracy vs speed trade-offs
Efficient for large vector datasets

Further Resources

Support

If you encounter issues:

Verify Java 21+ is installed: java -version
Verify Maven is installed: mvn -version
Ensure Azure CLI is logged in: az login
Verify environment variables are exported: echo $MONGO_CLUSTER_NAME
Check Azure service status and quotas

Share via