Edit

Develop AI apps with LangChain for Azure HorizonDB (Preview)

LangChain is a large language model (LLM) orchestration framework that simplifies building generative AI applications. Azure HorizonDB integrates with LangChain as a vector store, enabling you to store documents, create indices, and perform similarity search (cosine, L2, inner product) directly in PostgreSQL.

Azure HorizonDB supports vector embeddings via pgvector with DiskANN indexing for high-performance search at scale. For details, see Retrieval foundations: vector, full-text, and hybrid search in Azure HorizonDB (Preview).

Authentication

Azure HorizonDB supports password-based and Microsoft Entra authentication.

Microsoft Entra authentication allows you to use Microsoft Entra ID to authenticate to your PostgreSQL server. Microsoft Entra ID eliminates the need to manage separate usernames and passwords for your database users. It allows you to use the same security mechanisms that you use for other Azure services.

In this article, you can use either authentication method.

Setup

Azure HorizonDB uses the open-source LangChain Postgres support to connect to Azure HorizonDB. First, download the partner package:

%pip install -qU langchain-azure-postgresql
%pip install -qU langchain-openai
%pip install -qU azure-identity

Enable pgvector on Azure HorizonDB

See Implement vector search in Azure HorizonDB using the pgvector extension (Preview).

Set up credentials

You need to get your Azure HorizonDB connection details and add them as environment variables.

Set the USE_ENTRA_AUTH flag to True if you want to use Microsoft Entra authentication. If you're using Microsoft Entra authentication, you need to supply the only host and database names. If you're using password authentication, you also need to set the username and password.

import getpass
import os

USE_ENTRA_AUTH = True

# Supply the connection details for the database
os.environ["DBHOST"] = "<server-name>"
os.environ["DBNAME"] = "<database-name>"
os.environ["SSLMODE"] = "require"

if not USE_ENTRA_AUTH:
    # If you're using a username and password, supply them here
    os.environ["DBUSER"] = "<username>"
    os.environ["DBPASSWORD"] = getpass.getpass("Database Password:")

Set up Azure OpenAI embeddings

os.environ["AZURE_OPENAI_ENDPOINT"] = "<azure-openai-endpoint>"
os.environ["AZURE_OPENAI_API_KEY"] = getpass.getpass("Azure OpenAI API Key:")
AZURE_OPENAI_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"]
AZURE_OPENAI_API_KEY = os.environ["AZURE_OPENAI_API_KEY"]

from langchain_openai import AzureOpenAIEmbeddings

embeddings = AzureOpenAIEmbeddings(
    model="text-embedding-3-small",
    api_key=AZURE_OPENAI_API_KEY,
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    azure_deployment="text-embedding-3-small",
)

Initialization

Use Microsoft Entra authentication

The following sections demonstrate how to set up LangChain to use Microsoft Entra authentication. The class AzurePGConnectionPool in the LangChain Azure Postgres package retrieves tokens for the Azure HorizonDB service by using DefaultAzureCredential from the azure.identity library.

The connection can be passed into the connection parameter of the AzurePGVectorStore LangChain vector store.

Sign in to Azure

To sign in to Azure, ensure that you have the Azure CLI installed. Run the following command in your terminal:

az login

After you sign in, the following code fetches the token:

from langchain_azure_postgresql.common import (
    BasicAuth,
    AzurePGConnectionPool,
    ConnectionInfo,
)
from langchain_azure_postgresql.langchain import AzurePGVectorStore
entra_connection_pool = AzurePGConnectionPool(
        azure_conn_info=ConnectionInfo(
            host=os.environ["DBHOST"],
            dbname=os.environ["DBNAME"]
        )
    )

Use password authentication

If you're not using Microsoft Entra authentication, the BasicAuth class allows the use of username and password:

basic_auth_connection_pool = AzurePGConnectionPool(
    azure_conn_info=ConnectionInfo(
        host=os.environ["DBHOST"],
        dbname=os.environ["DBNAME"],
        credentials=BasicAuth(
            username=os.environ["DBUSER"],
            password=os.environ["DBPASSWORD"],
        )
    )
)

Create the vector store

from langchain_core.documents import Document
from langchain_azure_postgresql.langchain import AzurePGVectorStore

collection_name = "my_docs"

# The connection is either using Entra ID or Basic Auth
connection = entra_connection_pool if USE_ENTRA_AUTH else basic_auth_connection_pool

vector_store = AzurePGVectorStore(
    embeddings=embeddings,
    table_name=collection_name,
    connection=connection,
)

Management of the vector store

Add items to the vector store

Adding documents by ID overwrites any existing documents that match that ID.

docs = [
    Document(
        page_content="there are cats in the pond",
        metadata={"doc_id": 1, "location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="ducks are also found in the pond",
        metadata={"doc_id": 2, "location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="fresh apples are available at the market",
        metadata={"doc_id": 3, "location": "market", "topic": "food"},
    ),
    Document(
        page_content="the market also sells fresh oranges",
        metadata={"doc_id": 4, "location": "market", "topic": "food"},
    ),
    Document(
        page_content="the new art exhibit is fascinating",
        metadata={"doc_id": 5, "location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a sculpture exhibit is also at the museum",
        metadata={"doc_id": 6, "location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a new coffee shop opened on Main Street",
        metadata={"doc_id": 7, "location": "Main Street", "topic": "food"},
    ),
    Document(
        page_content="the book club meets at the library",
        metadata={"doc_id": 8, "location": "library", "topic": "reading"},
    ),
    Document(
        page_content="the library hosts a weekly story time for kids",
        metadata={"doc_id": 9, "location": "library", "topic": "reading"},
    ),
    Document(
        page_content="a cooking class for beginners is offered at the community center",
        metadata={"doc_id": 10, "location": "community center", "topic": "classes"},
    ),
]

uuids = vector_store.add_documents(docs)
uuids

Update items in the vector store

updated_docs = [
    Document(
        page_content="Updated - cooking class for beginners is offered at the community center",
        metadata={"doc_id": 10, "location": "community center", "topic": "classes"},
        id=uuids[-1],
    )
]
vector_store.add_documents(docs, ids=[uuids[-1]], on_conflict_update=True)

See items from the vector store

vector_store.get_by_ids([str(uuids[-1])])

Delete items from the vector store

vector_store.delete(ids=[uuids[-1]])

Queries to the vector store

After you create your vector store and add the relevant documents, you can query the vector store in your chain or agent.

Filter support

The vector store supports a set of filters that can be applied against the metadata fields of the documents via the FilterCondition, OrFilter, and AndFilter in the LangChain Azure PostgreSQL package:

Operator Meaning/Category
= Equality (==)
!= Inequality (!=)
< Less than (<)
<= Less than or equal (<=)
> Greater than (>)
>= Greater than or equal (>=)
in Special cased (in)
not in Special cased (not in)
is null Special cased (is null)
is not null Special cased (isn't null)
between Special cased (between)
not between Special cased (not between)
like Text (like)
ilike Text (case-insensitive like)
AND Logical (and)
OR Logical (or)

Direct query

You can perform a direct similarity search as follows:

from langchain_azure_postgresql import FilterCondition, AndFilter

results = vector_store.similarity_search(
    "kitty",
    k=10,
    filter=FilterCondition(
        column="(metadata->>'doc_id')::int",
        operator="in",
        value=[1, 5, 2, 9],
    ),
)

for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")
    * there are cats in the pond [{'doc_id': 1, 'topic': 'animals', 'location': 'pond'}]
    * ducks are also found in the pond [{'doc_id': 2, 'topic': 'animals', 'location': 'pond'}]
    * the new art exhibit is fascinating [{'doc_id': 5, 'topic': 'art', 'location': 'museum'}]
    * the library hosts a weekly story time for kids [{'doc_id': 9, 'topic': 'reading', 'location': 'library'}]

If you provide a dictionary with multiple fields but no operators, the top level is interpreted as a logical AND filter:

results = vector_store.similarity_search(
    "ducks",
    k=10,
    filter=AndFilter(
        AND=[
            FilterCondition(
                column="(metadata->>'doc_id')::int",
                operator="in",
                value=[1, 5, 2, 9],
            ),
            FilterCondition(
                column="metadata->>'location'",
                operator="in",
                value=["pond", "market"],
            ),
        ]
    ),
)

for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")
    * ducks are also found in the pond [{'topic': 'animals', 'doc_id': 2, 'location': 'pond'}]
    * there are cats in the pond [{'topic': 'animals', 'doc_id': 1, 'location': 'pond'}]

If you want to execute a similarity search and receive the corresponding scores, you can run:

results = vector_store.similarity_search_with_score(query="cats", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.528338] there are cats in the pond [{'doc_id': 1, 'topic': 'animals', 'location': 'pond'}]

If you want to use max marginal relevance search on your vector store:

results = vector_store.max_marginal_relevance_search(
    "query about cats",
    k=10,
    lambda_mult=0.5,
    filter=FilterCondition(
        column="(metadata->>'doc_id')::int",
        operator="in",
        value=[1, 2, 5, 9],
    ),
)

for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")
    * there are cats in the pond [{'doc_id': 1, 'topic': 'animals', 'location': 'pond'}]
    * ducks are also found in the pond [{'doc_id': 2, 'topic': 'animals', 'location': 'pond'}]
    * the new art exhibit is fascinating [{'doc_id': 5, 'topic': 'art', 'location': 'museum'}]
    * the library hosts a weekly story time for kids [{'doc_id': 9, 'topic': 'reading', 'location': 'library'}]

For a full list of the searches that you can execute on an AzurePGVectorStore vector store, refer to the API reference.

Transformation into a retriever

You can also transform the vector store into a retriever for easier usage in your chains:

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("kitty")
[Document(id='00000000-0000-0000-0000-000000000000', metadata={'doc_id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond')]