DocumentDB Vector Search for Java
This project demonstrates vector search capabilities using Azure DocumentDB with Java. It includes implementations of three different vector index types: DiskANN, HNSW, and IVF.
Overview
Vector search enables semantic similarity searching by converting text into high-dimensional vector representations (embeddings) and finding the most similar vectors in the database. This project shows how to:
- Generate embeddings using Azure OpenAI
- Store vectors in DocumentDB
- Create and use different types of vector indexes
- Perform similarity searches with various algorithms
Prerequisites
Before running this project, you need:
Azure Resources
- Azure subscription with appropriate permissions
- Azure Developer CLI (azd) installed
Development Environment
- Java 21 or higher
- Maven 3.6 or higher
- Git (for cloning the repository)
- Visual Studio Code (recommended) or another Java IDE
Setup Instructions
Clone and Setup Project
# Clone this repository
git clone https://github.com/Azure-Samples/documentdb-samples
Deploy Azure Resources
This project uses Azure Developer CLI (azd) to deploy all required Azure resources from the existing infrastructure-as-code files.
Install Azure Developer CLI
If you haven't already, install the Azure Developer CLI:
Windows:
winget install microsoft.azd
macOS:
brew tap azure/azd && brew install azd
Linux:
curl -fsSL https://aka.ms/install-azd.sh | bash
Deploy Resources
Navigate to the root of the repository and run:
# Login to Azure
azd auth login
# Provision Azure resources
azd up
During provisioning, you'll be prompted for:
- Environment name: A unique name for your deployment (e.g., "my-vector-search")
- Azure subscription: Select your Azure subscription
- Location: Choose from
eastus2orswedencentral(required for OpenAI models)
The azd up command will:
- Create a resource group
- Deploy Azure OpenAI with text-embedding-3-small model
- Deploy Azure DocumentDB (MongoDB vCore) cluster
- Create a managed identity for secure access
- Configure all necessary permissions and networking
- Generate a
.envfile with all connection information at the repository root
Compile the Project
# Move to Java vector search project
cd ai/vector-search-java
# Compile the project
mvn clean compile
Load Environment Variables
After deployment completes, load the environment variables from the generated .env file. The set -a command ensures variables are exported to child processes (like the Maven JVM):
# From the ai/vector-search-java directory
set -a && source ../../.env && set +a
You can verify the environment variables are set:
echo $MONGO_CLUSTER_NAME
Usage
The project includes several Java classes that demonstrate different aspects of vector search.
Sign in to Azure for passwordless connection
az login
DiskANN Vector Search
Run DiskANN (Disk-based Approximate Nearest Neighbor) search:
mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.DiskAnn"
DiskANN is optimized for:
- Large datasets that don't fit in memory
- Efficient disk-based storage
- Good balance of speed and accuracy
HNSW Vector Search
Run HNSW (Hierarchical Navigable Small World) search:
mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.HNSW"
HNSW provides:
- Excellent search performance
- High recall rates
- Hierarchical graph structure
- Good for real-time applications
IVF Vector Search
Run IVF (Inverted File) search:
mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.IVF"
IVF features:
- Clusters vectors by similarity
- Fast search through cluster centroids
- Configurable accuracy vs speed trade-offs
- Efficient for large vector datasets
Further Resources
- Azure Developer CLI Documentation
- Azure DocumentDB Documentation
- Azure OpenAI Service Documentation
- Vector Search in DocumentDB
- MongoDB Java Driver Documentation
- Azure SDK for Java Documentation
Support
If you encounter issues:
- Verify Java 21+ is installed:
java -version - Verify Maven is installed:
mvn -version - Ensure Azure CLI is logged in:
az login - Verify environment variables are exported:
echo $MONGO_CLUSTER_NAME - Check Azure service status and quotas