Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Use the langchain_azure_ai.agents.hosting package to expose a compiled
LangGraph graph through the protocols for Microsoft Foundry
hosted agents. The hosting
package lets you keep your LangChain and LangGraph agent logic in code while
Foundry manages the hosted runtime, sessions, scale, identity, and protocol
endpoints.
In this article, you create a minimal LangGraph agent, expose it through either the Responses or Invocations protocol, test it through HTTP, and deploy it to Foundry with the Azure Developer CLI or the Foundry Toolkit Visual Studio Code extension.
Prerequisites
- An Azure subscription. Create one for free.
- A Foundry project.
- A deployed chat model, such as
gpt-4.1orgpt-5-mini. - Python 3.10 or later.
- Azure CLI signed in (
az login) soDefaultAzureCredentialcan authenticate.
Install the package
Install langchain-azure-ai 1.2.4 or later with the hosting extra:
pip install -U "langchain-azure-ai[hosting]>=1.2.4" azure-identity
The hosting extra installs the Foundry protocol libraries used by the host
servers:
azure-ai-agentserver-responsesfor the OpenAI-compatible/responsesendpoint.azure-ai-agentserver-invocationsfor the generic/invocationsendpoint.
Choose a hosting protocol
Hosted agents can expose one or more protocols. Start with Responses for most conversational agents.
| Protocol | Host class | Endpoint | Use when |
|---|---|---|---|
| Responses | ResponsesHostServer |
/responses |
You want OpenAI-compatible chat, streaming, response history, and conversation threading. |
| Invocations | InvocationsHostServer |
/invocations |
You want a custom JSON shape, a webhook-style endpoint, or non-conversational processing. |
For background on protocol behavior and sessions, see Hosted agents and Manage Hosted agent sessions.
Configure environment variables
Set the project endpoint and model deployment name for local development:
export FOUNDRY_PROJECT_ENDPOINT="https://<resource>.services.ai.azure.com/api/projects/<project>"
export AZURE_AI_MODEL_DEPLOYMENT_NAME="gpt-4.1"
When the same code runs as a Hosted agent in Foundry, the platform injects
FOUNDRY_PROJECT_ENDPOINT. If you use azd ai agent init with a sample
manifest, the generated project also uses AZURE_AI_MODEL_DEPLOYMENT_NAME for
the selected model deployment.
Responses protocol
Use the Responses protocol when you want an OpenAI-compatible chat endpoint with streaming, response history, and conversation threading.
Create a Responses host
Create a file named main.py with a minimal LangGraph agent that uses a
Foundry model. This pattern matches the basic Responses sample in the
langchain-azure-ai source repository.
import os
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
from langchain_azure_ai.agents.hosting import ResponsesHostServer
_AZURE_AI_SCOPE = "https://ai.azure.com/.default"
def build_chat_model() -> ChatOpenAI:
project_endpoint = os.environ["FOUNDRY_PROJECT_ENDPOINT"].rstrip("/")
deployment = os.environ.get("AZURE_AI_MODEL_DEPLOYMENT_NAME", "gpt-4.1")
credential = DefaultAzureCredential()
project = AIProjectClient(endpoint=project_endpoint, credential=credential)
openai_client = project.get_openai_client()
token_provider = get_bearer_token_provider(credential, _AZURE_AI_SCOPE)
return ChatOpenAI(
model=deployment,
base_url=str(openai_client.base_url),
api_key=token_provider,
)
def main() -> None:
graph = create_agent(build_chat_model(), tools=[])
port = int(os.environ.get("PORT", "8088"))
ResponsesHostServer(graph).run(port=port)
if __name__ == "__main__":
main()
What this snippet does: Creates a LangGraph agent with LangChain's
create_agent, connects it to the Foundry project's OpenAI-compatible model
endpoint, and passes the compiled graph to ResponsesHostServer. The host
starts an HTTP server and exposes the graph through POST /responses. By
default, the server binds to port 8088, or to the value of the PORT
environment variable when one is set.
Run the app locally:
python main.py
Test the Responses endpoint
Send a non-streaming Responses request to the local server.
Bash:
curl -sS -H "Content-Type: application/json" \
-X POST http://localhost:8088/responses \
-d '{"input":"Give me one practical tip for testing hosted agents.","stream":false}'
PowerShell:
$body = @{
input = "Give me one practical tip for testing hosted agents."
stream = $false
} | ConvertTo-Json
Invoke-RestMethod `
-Uri http://localhost:8088/responses `
-Method Post `
-Body $body `
-ContentType "application/json"
For streaming responses, set stream to true. The host emits Responses API
server-sent events, such as response.created, response.output_text.delta,
and response.completed.
Conversations
ResponsesHostServer supports two conversation-state patterns. The pattern it
uses depends on whether your compiled graph has a LangGraph checkpointer.
| Graph configuration | Conversation source | What the host sends to the graph on later turns |
|---|---|---|
| Graph without a checkpointer | Responses history from the protocol runtime | Prior response history plus the current request input |
| Graph compiled with a checkpointer | LangGraph checkpoint state keyed by the conversation or response thread | Current request input only |
Use a checkpointer when your graph needs LangGraph runtime state, interrupts, or node-local state across turns. For local testing, you can use an in-memory checkpointer:
from langgraph.checkpoint.memory import MemorySaver
graph = create_agent(
build_chat_model(),
tools=[],
checkpointer=MemorySaver(),
)
For production Hosted agents, use a durable checkpointer instead of an in-memory checkpointer so graph state survives container restarts.
Clients continue a Responses conversation by passing previous_response_id or
a conversation ID. For local testing, chain the previous response ID in the
next request:
POST http://localhost:8088/responses
Content-Type: application/json
{
"input": "Can you make that more concise?",
"previous_response_id": "<previous-response-id>",
"stream": false
}
When the agent runs in Foundry, the same pattern works through the Hosted agent
Responses endpoint. If later turns also need the same hosted sandbox filesystem,
include agent_session_id or use a conversation ID. For details, see
Manage Hosted agent sessions.
Human-in-the-loop
If your graph uses LangGraph interrupt() calls, ResponsesHostServer surfaces
pending interrupts through standard Responses API output items:
- A
function_callitem named__hosted_agent_adapter_interrupt__. - An
mcp_approval_requestitem withserver_labelset tolanggraph.
Clients can resume the graph by sending either a function_call_output item
whose call_id matches the interrupt ID or an mcp_approval_response item
whose approval_request_id matches the interrupt ID. Use
function_call_output when you need to send a rich LangGraph Command payload
with resume, update, or goto fields. Use mcp_approval_response for a
simple approve or reject flow.
Invocations protocol
Use InvocationsHostServer when your callers can't use the Responses API
request shape or when your scenario isn't a chat conversation. The default
Invocations host accepts a message string and an optional stream flag.
Create an Invocations host
Use the same model-building function from the Responses example, but start
InvocationsHostServer instead of ResponsesHostServer.
import os
from langchain.agents import create_agent
from langgraph.checkpoint.memory import MemorySaver
from langchain_azure_ai.agents.hosting import InvocationsHostServer
def main() -> None:
graph = create_agent(
build_chat_model(),
tools=[],
checkpointer=MemorySaver(),
)
port = int(os.environ.get("PORT", "8088"))
InvocationsHostServer(graph).run(port=port)
if __name__ == "__main__":
main()
What this snippet does: Hosts the LangGraph agent through POST /invocations. The MemorySaver checkpointer gives local multi-turn continuity
for a given session ID. For production, use a durable checkpointer so state
survives container restarts.
Test the Invocations endpoint
Send a non-streaming request:
curl -i -X POST http://localhost:8088/invocations \
-H "Content-Type: application/json" \
-d '{"message":"My name is Alice.","stream":false}'
Non-streaming requests return JSON in this shape:
{
"response": "Assistant text"
}
For multi-turn conversations, reuse the x-agent-session-id response header as
the agent_session_id query parameter on the next request:
curl -X POST "http://localhost:8088/invocations?agent_session_id=<session-id>" \
-H "Content-Type: application/json" \
-d '{"message":"What is my name?"}'
Streaming requests return text/event-stream events with token payloads:
curl -N -X POST http://localhost:8088/invocations \
-H "Content-Type: application/json" \
-d '{"message":"Count to 5.","stream":true}'
The stream contains token events followed by a terminal done event:
data: {"token": "..."}
event: done
data: {}
Customize the request schema
To customize the request body, subclass InvocationsHostServer and override
parse_request. You can also override build_input to map the parsed data to a
custom graph state.
from starlette.requests import Request
from langchain_azure_ai.agents.hosting import InvocationsHostServer
class TicketHostServer(InvocationsHostServer):
async def parse_request(self, request: Request) -> tuple[str, bool]:
data = await request.json()
ticket_id = data["ticket_id"]
description = data["description"]
stream = bool(data.get("stream", False))
return f"Summarize ticket {ticket_id}: {description}", stream
if __name__ == "__main__":
TicketHostServer(graph).run()
What this snippet does: Accepts a custom ticket payload and converts it to a
single user message before the host invokes the graph. For more complex graph
state, override build_input instead of flattening the request to text.
Deploy
You can deploy with the Azure Developer CLI or the Foundry Toolkit Visual Studio Code extension. The Azure Developer CLI flow uses sample manifests and Docker; the extension flow provides a guided deployment experience in Visual Studio Code.
Hosted agent deployment requires the Foundry Project Manager role on the project. For details, see Deploy a Hosted agent.
Deploy with Azure Developer CLI
The langchain-azure-ai source repository includes Hosted agent samples that
can be run and deployed with the Azure Developer CLI. The flow uses each
sample's agent.manifest.yaml, agent.yaml, Dockerfile, and main.py.
Install the AI agent extension and sign in before you initialize a sample:
azd ext install azure.ai.agents
azd auth login
Docker must be running locally because azd ai agent run builds the container
image declared in the sample's Dockerfile. For command details, see the
Azure Developer CLI reference.
Initialize from a sample manifest
Create a new folder and initialize it from a sample manifest. Replace the manifest URL with the sample you want to use.
mkdir my-langchain-agent
cd my-langchain-agent
azd ai agent init -m https://github.com/langchain-ai/langchain-azure/blob/main/samples/hosting/langgraph-hosted-agents/responses/01_basic/agent.manifest.yaml
Follow the prompts from azd ai agent init. If you don't already have a
Foundry project and model deployment, the initialization flow can guide you
through creating them.
Run the container locally
Run the agent host locally through azd:
azd ai agent run
The host serves on http://127.0.0.1:8088. In another terminal, invoke the
local protocol endpoint directly:
curl -X POST http://127.0.0.1:8088/responses \
-H "Content-Type: application/json" \
-d '{"input": "Hello!"}'
PowerShell equivalent:
(Invoke-WebRequest -Uri http://127.0.0.1:8088/responses `
-Method POST -ContentType 'application/json' `
-Body '{"input": "Hello!"}').Content
You can also invoke the local agent through azd:
azd ai agent invoke --local "Hello!"
Deploy to Foundry
If the initialized project uses a new Foundry project and model deployment, provision the Azure resources first:
azd provision
Deploy the agent:
azd deploy
The deployment packages the agent into a container image, pushes it to the provisioned container registry, and rolls it out to the Foundry Hosted agent runtime.
The Foundry hosting infrastructure injects runtime environment variables into the agent, including:
FOUNDRY_PROJECT_ENDPOINT: The endpoint URL for the Foundry project where the agent is deployed.AZURE_AI_MODEL_DEPLOYMENT_NAME: The model deployment name selected duringazd ai agent init.APPLICATIONINSIGHTS_CONNECTION_STRING: The connection string for the project's Application Insights instance.
For complete deployment concepts, permissions, and management details, see Deploy a Hosted agent and Manage Hosted agent lifecycle.
Deploy with Foundry Toolkit Visual Studio Code extension
For extension-based deployment, see Quickstart: Deploy your first hosted agent.
Troubleshooting
Use this checklist to diagnose common issues while developing Hosted agents
with langchain_azure_ai.agents.hosting.
Graph schema validation fails
The default hosts expect a compiled LangGraph graph whose state has a
messages field, such as MessagesState. If your graph uses a custom state
schema, subclass the host and override build_input. For Responses, override
handle_create when you need full control over request parsing, graph
execution, and emitted Responses events.
Conversation state doesn't continue
For the Responses protocol, pass previous_response_id or a conversation ID
on later turns. If your graph uses a checkpointer, make sure the checkpointer is
configured and durable for the environment where the agent runs.
For the Invocations protocol, the platform doesn't store conversation history.
Use an agent_session_id query parameter to route later calls to the same
hosted sandbox and use your own state store or LangGraph checkpointer for
conversation state.
The model can't be reached in the hosted container
Confirm that the Hosted agent version includes AZURE_AI_MODEL_DEPLOYMENT_NAME,
and that the agent identity has permission to call the Foundry project. The
platform sets FOUNDRY_PROJECT_ENDPOINT; your code should read that variable
when running in Foundry.