Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Use langchain-azure-ai and Foundry Memory to add long-term memory to your
applications. In this article, you create a memory-backed chain,
store user preferences, recall them in a new session, and run direct memory
queries.
This pattern works for both LangChain and LangGraph applications. The core idea is to keep short-term chat history in your runtime and use Foundry Memory as the long-term store for user-level context.
Foundry Memory focuses on long-term memory. Keep short-term turn-by-turn state in LangChain or LangGraph runtime state.
Prerequisites
- An Azure subscription. Create one for free.
- A Foundry project.
- A deployed Azure OpenAI chat model for app responses.
- A deployed chat model and embedding model for the memory store.
- Python 3.10 or later.
- Azure CLI signed in (
az login) soDefaultAzureCredentialcan authenticate.
Configure your environment
Install the required packages for this tutorial. Use langchain-azure-ai for
LangChain and LangGraph integration, azure-ai-projects for memory store
management, and azure-identity for authentication.
pip install -U "langchain-azure-ai[v2]" azure-ai-projects azure-identity
Set your environment variables that we use in this tutorial:
export AZURE_AI_PROJECT_ENDPOINT="https://<resource>.services.ai.azure.com/api/projects/<project>"
export AZURE_OPENAI_ENDPOINT="https://<resource>.openai.azure.com/openai/v1"
export AZURE_OPENAI_DEPLOYMENT="gpt-4.1"
export MEMORY_STORE_CHAT_MODEL_DEPLOYMENT_NAME="gpt-4.1"
export MEMORY_STORE_EMBEDDING_MODEL_DEPLOYMENT_NAME="text-embedding-3-large"
Understand the memory model
Foundry Memory stores and retrieves two long-term memory types:
- User profile memory: stable user facts and preferences, such as preferred name or dietary constraints.
- Chat summary memory: distilled summaries of prior discussion topics.
Memory uses the idea of "scope" to partition information so it can be stored and retrieved consistently. Scopes are like identifiers or keys to organize information.
- You can use user IDs as the stable identity for long-term memory. Keep it the same across sessions for the same user.
- You can use session IDs as the short-term conversation identity. Change it per chat session.
- You can use resource IDs as the stable identifier for long-term memory across multiple users.
This separation lets your app remember user preferences across sessions without mixing unrelated conversations.
Create the memory store
Before getting started, you need to create a memory store. For this operation, use
the Microsoft Foundry projects SDK azure-ai-projects.
import os
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import (
MemoryStoreDefaultDefinition,
MemoryStoreDefaultOptions,
)
from azure.core.exceptions import ResourceNotFoundError
from azure.identity import DefaultAzureCredential
endpoint = os.environ["AZURE_AI_PROJECT_ENDPOINT"]
credential = DefaultAzureCredential()
client = AIProjectClient(endpoint=endpoint, credential=credential)
store_name = "lc-integration-test-store"
try:
store = client.memory_stores.get(store_name)
print(f"✓ Memory store '{store_name}' already exists")
except ResourceNotFoundError:
definition = MemoryStoreDefaultDefinition(
chat_model=os.environ["MEMORY_STORE_CHAT_MODEL_DEPLOYMENT_NAME"],
embedding_model=os.environ[
"MEMORY_STORE_EMBEDDING_MODEL_DEPLOYMENT_NAME"
],
options=MemoryStoreDefaultOptions(
user_profile_enabled=True,
chat_summary_enabled=True,
),
)
store = client.memory_stores.create(
name=store_name,
description="Long-term memory store",
definition=definition,
)
print(f"✓ Memory store '{store_name}' created successfully")
✓ Memory store 'lc-integration-test-store' created successfully
What this snippet does: Connects to your Foundry project, gets or creates the memory store, and enables user profile plus chat summary extraction.
Using memory in LangGraph and LangChain
Foundry Memory integrates in LangGraph and LangChain by introducing two objects:
- The class
langchain_azure_ai.chat_message_histories.AzureAIMemoryChatMessageHistorycreates a memory-backed chat history. - The class
langchain_azure_ai.retrievers.AzureAIMemoryRetrieverallows retrieval of memories from the chat message history.
In general, you can use the following practical retrieval strategies with them:
- Retrieve user profile memory early in a conversation to personalize responses.
- Retrieve chat summary memory based on the current turn to recover relevant prior context.
Example: Add a session-aware memory layer
In this example, we build a single runnable in LangChain that retrieves relevant long-term memory, injects it into the prompt, and executes the model with short-term chat history and long-term memory together.
Let's see how to implement it:
Create the chat message history
This example uses a stable user_id as the memory scope. Use session_id for per-session
conversation context.
from langchain_azure_ai.chat_message_histories import AzureAIMemoryChatMessageHistory
from langchain_azure_ai.retrievers import AzureAIMemoryRetriever
from langchain_core.chat_history import InMemoryChatMessageHistory
session_histories: dict[tuple[str, str], AzureAIMemoryChatMessageHistory] = {}
def get_session_history(
user_id: str,
session_id: str,
) -> AzureAIMemoryChatMessageHistory:
cache_key = (user_id, session_id)
if cache_key not in session_histories:
session_histories[cache_key] = AzureAIMemoryChatMessageHistory(
client=client,
store_name=store_name,
scope=user_id,
session_id=session_id,
base_history_factory=InMemoryChatMessageHistory(),
update_delay=0,
)
return session_histories[cache_key]
def get_foundry_retriever(
user_id: str,
session_id: str,
) -> AzureAIMemoryRetriever:
return get_session_history(user_id, session_id).get_retriever(k=5)
What this snippet does: Creates a memory-backed history and retriever per
(user_id, session_id) pair and caches them so retrieval state survives across
turns in the same session. For this walkthrough, update_delay=0 makes memory updates immediately visible.
In production, use the default delay unless you specifically need instant
extraction. session_histories is used to avoid having to recreate the objects constantly.
Compose the runnable with memory retrieval
Let's create a runnable to implement the loop:
from typing import Any
import os
from azure.identity import DefaultAzureCredential
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import ConfigurableFieldSpec, RunnablePassthrough
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel
llm = AzureAIChatCompletionsModel(
endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
credential=DefaultAzureCredential(),
model=os.environ["AZURE_OPENAI_DEPLOYMENT"],
)
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are helpful and concise. Use prior memories when relevant.",
),
MessagesPlaceholder("history"),
("system", "Memories:\n{memories}"),
("human", "{question}"),
]
)
def chain_for_session(user_id: str, session_id: str) -> RunnableWithMessageHistory:
retriever = get_foundry_retriever(user_id, session_id)
def format_memories(x: dict[str, Any]) -> str:
docs = retriever.invoke(x["question"])
return (
"\n".join([doc.page_content for doc in docs])
if docs
else "No relevant memories found."
)
chain = RunnablePassthrough.assign(memories=format_memories) | prompt | llm
return RunnableWithMessageHistory(
chain,
get_session_history=get_session_history,
input_messages_key="question",
history_messages_key="history",
history_factory_config=[
ConfigurableFieldSpec(
id="user_id",
annotation=str,
name="User ID",
description="Unique identifier for the user.",
default="",
is_shared=True,
),
ConfigurableFieldSpec(
id="session_id",
annotation=str,
name="Session ID",
description="Unique identifier for the session.",
default="",
is_shared=True,
),
],
)
What this snippet does: Builds a runnable that injects retrieved memories
into the prompt, then wraps it with RunnableWithMessageHistory so chat history
and long-term memory work together.
This pattern keeps your prompt deterministic: every turn explicitly includes
retrieved memory in the Memories section.
Run a practical cross-session scenario
This scenario shows the full value of long-term memory:
- In session A, the user shares preferences.
- In session B, the app recalls those preferences automatically.
import time
user_id = "user_001"
session_id = "session_2026_02_10_001"
chain = chain_for_session(user_id, session_id)
print("\n=== Turn 1 (Session A) ===")
r1 = chain.invoke(
{"question": "Hi! Call me JT. I prefer dark roast coffee and budget trips."},
config={"configurable": {"user_id": user_id, "session_id": session_id}},
)
print("ASSISTANT:", r1.content)
print("\n=== Turn 2 (Session A) ===")
r2 = chain.invoke(
{
"question": "Also, I usually drink green tea in the afternoon "
"and I like staying in hostels.",
},
config={"configurable": {"user_id": user_id, "session_id": session_id}},
)
print("ASSISTANT:", r2.content)
time.sleep(60)
session_id_b = "session_2026_02_10_002"
chain_b = chain_for_session(user_id, session_id_b)
print("\n=== Turn 3 (Session B) ===")
r3 = chain_b.invoke(
{"question": "Remind me of my coffee preference and travel style."},
config={"configurable": {"user_id": user_id, "session_id": session_id_b}},
)
print("ASSISTANT:", r3.content)
print("\n=== Turn 4 (Session B) ===")
r4 = chain_b.invoke(
{
"question": "What do I usually drink in the afternoon, "
"and where do I like to stay?",
},
config={"configurable": {"user_id": user_id, "session_id": session_id_b}},
)
print("ASSISTANT:", r4.content)
=== Turn 1 (Session A) ===
ASSISTANT: Nice to meet you, JT. I noted that you prefer dark roast coffee and budget trips.
=== Turn 2 (Session A) ===
ASSISTANT: Got it. I also noted that you usually drink green tea in the afternoon and prefer hostels.
=== Turn 3 (Session B) ===
ASSISTANT: Your coffee preference is dark roast, and your travel style is budget trips.
=== Turn 4 (Session B) ===
ASSISTANT: You usually drink green tea in the afternoon, and you like staying in hostels.
What this snippet does: Seeds user preferences in session A, starts session B for the same user, and shows that the app can recall prior preferences across sessions.
Example: Query memory directly for non-chat use cases
Use an ad-hoc retriever when you want direct memory reads outside the conversation pipeline, for example in personalization middleware or profile inspection tools.
adhoc = AzureAIMemoryRetriever(
client=client,
store_name=store_name,
scope=user_id,
k=5,
)
docs = adhoc.invoke("What are my drinking preferences?")
for i, doc in enumerate(docs, start=1):
print(f"MEMORY {i}:", doc.page_content)
MEMORY 1: Prefers dark roast coffee.
MEMORY 2: Prefers budget trips.
MEMORY 3: Usually drinks green tea in the afternoon.
MEMORY 4: Likes staying in hostels.
What this snippet does: Runs a direct memory search for the current scope. All memories
are retrieved (capped by k) but sorted by relevance.
Use this pattern when you need direct memory reads for features such as profile cards, personalization middleware, or workflow routing.
Example: Use memory in graphs
LangGraph uses the same conceptual pattern:
- Keep
user_idstable for long-term memory. - Use
thread_id(or equivalent) for short-term thread context. - Retrieve memory before calling the model node.
If you already have a StateGraph, inject retrieval in your model node and
append memory text to your model input. Another typical strategy is to use
a pre-model hook.
from langgraph.graph import MessagesState
def call_model_with_foundry_memory(state: MessagesState, config: dict):
user_id = config["configurable"]["user_id"]
session_id = config["configurable"]["thread_id"]
query = state["messages"][-1].content
retriever = get_foundry_retriever(user_id, session_id)
docs = retriever.invoke(query)
memory_text = "\n".join(d.page_content for d in docs) if docs else ""
response = llm.invoke(
[
{"role": "system", "content": "Use prior memories when relevant."},
{"role": "system", "content": f"Memories:\n{memory_text}"},
*state["messages"],
]
)
return {"messages": [response]}
What this snippet does: Shows a LangGraph node pattern that retrieves Foundry memory for the current turn and injects it into model input.
For broader LangGraph memory concepts, see:
Understand preview limits and operational guidance
Before moving to production, validate these constraints:
- Memory is in preview and behavior can change.
- Memory requires compatible chat and embedding deployments.
- Quotas apply per store and per scope, including search and update request rates.
Also plan defensive controls for memory poisoning or prompt-injection attempts. Validate untrusted inputs before they influence stored memory.
Clean up resources
After running samples, delete the scope to avoid test data leaking into future runs.
result = client.memory_stores.delete_scope(name=store_name, scope=user_id)
print(
f"Deleted {getattr(result, 'deleted_count', 'all')} memories "
f"for scope '{user_id}'."
)
Deleted 4 memories for scope 'user_001'.