Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Note
This document refers to the Microsoft Foundry (classic) portal.
🔄 Switch to the Microsoft Foundry (new) documentation if you're using the new portal.
Note
This document refers to the Microsoft Foundry (new) portal.
LangChain is a developer ecosystem that makes it easier to build reasoning applications. It includes multiple components, and most of them can be used independently, so you can pick and choose the pieces you need.
You can use models deployed to Microsoft Foundry with LangChain in two ways:
You can use models deployed to Microsoft Foundry with LangChain in two ways:
Use the model provider's API: Some models, such as OpenAI, Cohere, or Mistral, offer their own APIs and LangChain extensions. These extensions might include model-specific capabilities and are suitable if you need to use them. Install the extension for your chosen model, such as
langchain-openaiorlangchain-cohere.Use the Azure AI Model Inference API: All models deployed in Microsoft Foundry support the Model Inference API, which offers a common set of capabilities across most models in the catalog. Because the API is consistent, switching models is as simple as changing the deployment; no code changes are required. For LangChain, also install the
langchain-azure-aiintegration.
Important
If you're currently using an Azure AI Inference beta SDK with Microsoft Foundry Models or Azure OpenAI service, we strongly recommend that you transition to the generally available OpenAI/v1 API, which uses an OpenAI stable SDK.
For more information on how to migrate to the OpenAI/v1 API by using an SDK in your programming language of choice, see Migrate from Azure AI Inference SDK to OpenAI SDK.
This tutorial shows how to use the langchain-azure-ai package with LangChain.
Prerequisites
To run this tutorial, you need:
- An Azure account with an active subscription. If you don't have one, create a free Azure account, which includes a free trial subscription.
- Required role:
- Owner or Contributor on the Foundry resource or AI Hub to deploy models
- Azure AI User to use the model in a Foundry project
- Azure AI Developer to use the model in a hub-based project
- Required role:
- Owner or Contributor on the Foundry resource to deploy models
- Azure AI User to use the model in a Foundry project
A model deployment that supports the Model Inference API. This article uses
Mistral-Large-2411in code examples, but this model is deprecated. Deploy a more recent Mistral model such asMistral-Large-3orMistral-Nemofrom the Foundry model catalog instead, and substitute your model name in the code examples.Python 3.9 or later installed, including pip.
LangChain installed. You can install it by using the following command:
pip install langchainInstall the Foundry integration:
pip install -U langchain-azure-ai
Configure the environment
To use LLMs deployed in Microsoft Foundry portal, you need the endpoint and credentials to connect to it. Follow these steps to get the information you need from the model you want to use:
Tip
Because you can customize the left pane in the Microsoft Foundry portal, you might see different items than shown in these steps. If you don't see what you're looking for, select ... More at the bottom of the left pane.
-
Sign in to Microsoft Foundry. Make sure the New Foundry toggle is off. These steps refer to Foundry (classic).
Open the project where the model is deployed, if it isn't already open.
Go to Models + endpoints and select the model you deployed as indicated in the prerequisites.
Copy the endpoint URL and the key.
Tip
If your model was deployed with Microsoft Entra ID support, you don't need a key.
In this scenario, set the endpoint URL and key as environment variables. (If the endpoint you copied includes additional text after /models, remove it so the URL ends at /models as shown below.)
export AZURE_INFERENCE_ENDPOINT="https://<resource>.services.ai.azure.com/models"
export AZURE_INFERENCE_CREDENTIAL="<your-key-goes-here>"
To use LLMs deployed in Microsoft Foundry portal, you need the endpoint and credentials to connect to it. Follow these steps to get the information you need from the model you want to use:
-
Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).
Open the project where the model is deployed, if it isn't already open.
Copy the endpoint URL and the key.
In this scenario, set the endpoint URL and key as environment variables. (If the endpoint you copied includes additional text after /models, remove it so the URL ends at /models as shown below.)
export AZURE_INFERENCE_ENDPOINT="https://<resource>.services.ai.azure.com/models"
export AZURE_INFERENCE_CREDENTIAL="<your-key-goes-here>"
Create a client to connect to the chat model by using the AzureAIChatCompletionsModel class.
import os
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel
model = AzureAIChatCompletionsModel(
endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
model="Mistral-Large-2411",
)
What this snippet does: Instantiates an AzureAIChatCompletionsModel client configured to connect to your deployed model using an API key for authentication. This client acts as an interface to the Model Inference API.
References:
Caution
Breaking change: The model_name parameter is renamed to model in version 0.1.3.
If your endpoint supports Microsoft Entra ID, use the following code to create the client:
import os
from azure.identity import DefaultAzureCredential
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel
model = AzureAIChatCompletionsModel(
endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
credential=DefaultAzureCredential(),
model="Mistral-Large-2411",
)
Note
When using Microsoft Entra ID, make sure that the endpoint is deployed with that authentication method and that you have the required permissions to invoke it.
If you plan to use asynchronous calls, use the asynchronous version of the credentials:
from azure.identity.aio import (
DefaultAzureCredential as DefaultAzureCredentialAsync,
)
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel
model = AzureAIChatCompletionsModel(
endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
credential=DefaultAzureCredentialAsync(),
model="Mistral-Large-2411",
)
If your endpoint serves a single model (for example, serverless API deployments), don't include the model parameter:
import os
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel
model = AzureAIChatCompletionsModel(
endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
)
Verify your setup
Test your client connection with a simple invocation:
response = model.invoke("Say hello")
print(response.content)
What this does: Calls the model with a simple prompt to verify authentication and connectivity. Expected output: A greeting message from the model (for example, "Hello! How can I assist you today?").
References:
Use chat completion models
Use the model directly. ChatModels are instances of the LangChain Runnable interface, which provides a standard way to interact with them. To call the model, pass a list of messages to the invoke method.
from langchain_core.messages import HumanMessage, SystemMessage
messages = [
SystemMessage(content="Translate the following from English into Italian"),
HumanMessage(content="hi!"),
]
model.invoke(messages)
What this snippet does: Demonstrates how to pass a list of HumanMessage and SystemMessage objects to the model's invoke() method to generate a response.
References:
Compose operations as needed in chains. Use a prompt template to translate sentences:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
system_template = "Translate the following into {language}:"
prompt_template = ChatPromptTemplate.from_messages(
[("system", system_template), ("user", "{text}")]
)
This chain takes language and text inputs. Now, create an output parser:
from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()
What this snippet does: Creates a StrOutputParser that converts the model's output into a string format, stripping any extra metadata.
References:
Combine the template, model, and output parser by using the pipe (|) operator:
chain = prompt_template | model | parser
Invoke the chain by providing language and text values by using the invoke method:
chain.invoke({"language": "italian", "text": "hi"})
Chain multiple LLMs together
Because models in Foundry expose a common Model Inference API, you can chain multiple LLM operations and choose the model best suited to each step.
In the following example, you create two model clients: one producer and one verifier. To make the distinction clear, use a multi-model endpoint such as the Model Inference API and pass the model parameter to a large model for generation and a small model for verification. Producing content generally requires a larger model, while verification can use a smaller one.
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel
producer = AzureAIChatCompletionsModel(
endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
model="Mistral-Large-2411",
)
verifier = AzureAIChatCompletionsModel(
endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
model="mistral-small",
)
What this snippet does: Instantiates two separate AzureAIChatCompletionsModel clients: one using Mistral-Large-2411 for content generation and another using Mistral-Small for verification, demonstrating how to choose different models for different tasks.
References:
Tip
Review the model card for each model to understand the best use cases.
The following example generates a poem written by an urban poet:
from langchain_core.prompts import PromptTemplate
producer_template = PromptTemplate(
template="You are an urban poet, your job is to come up \
verses based on a given topic.\n\
Here is the topic you have been asked to generate a verse on:\n\
{topic}",
input_variables=["topic"],
)
verifier_template = PromptTemplate(
template="You are a verifier of poems, you are tasked\
to inspect the verses of poem. If they consist of violence and abusive language\
report it. Your response should be only one word either True or False.\n \
Here is the lyrics submitted to you:\n\
{input}",
input_variables=["input"],
)
What this snippet does: Creates a prompt template and chains it with the producer model client to generate creative content (a poem in this case).
Chain the pieces:
chain = producer_template | producer | parser | verifier_template | verifier | parser
What this snippet does: Chains the generated poem through the verifier model to validate or review the generated content, demonstrating a producer-verifier workflow.
The previous chain returns only the output of the verifier step. To access the intermediate result generated by the producer, use a RunnablePassthrough to output that intermediate step.
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
generate_poem = producer_template | producer | parser
verify_poem = verifier_template | verifier | parser
chain = generate_poem | RunnableParallel(poem=RunnablePassthrough(), verification=RunnablePassthrough() | verify_poem)
Invoke the chain by using the invoke method:
chain.invoke({"topic": "living in a foreign country"})
What this snippet does: Calls the complete producer-verifier chain with a topic input, returning both the generated content and the verification result. Expected output: A JSON object containing poem and verification keys with the generated poem and verification response.
References:
Use embedding models
Create an embeddings client in a similar way. Set the environment variables to point to an embeddings model:
export AZURE_INFERENCE_ENDPOINT="<your-model-endpoint-goes-here>"
export AZURE_INFERENCE_CREDENTIAL="<your-key-goes-here>"
Create the client:
import os
from langchain_azure_ai.embeddings import AzureAIEmbeddingsModel
embed_model = AzureAIEmbeddingsModel(
endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
model="text-embedding-3-large",
)
What this snippet does: Instantiates an embeddings client using AzureAIEmbeddingsModel to convert text into vector embeddings, which can be used for semantic search and similarity comparisons.
References:
Use an in-memory vector store:
from langchain_core.vectorstores import InMemoryVectorStore
vector_store = InMemoryVectorStore(embed_model)
What this snippet does: Creates an in-memory vector store (InMemoryVectorStore) that stores embeddings for fast similarity search operations.
References:
Add documents:
from langchain_core.documents import Document
document_1 = Document(id="1", page_content="foo", metadata={"baz": "bar"})
document_2 = Document(id="2", page_content="thud", metadata={"bar": "baz"})
documents = [document_1, document_2]
vector_store.add_documents(documents=documents)
What this snippet does: Converts documents into embeddings using the embeddings client and adds them to the vector store for later retrieval.
References:
Search by similarity:
results = vector_store.similarity_search(query="thud", k=1)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
What this snippet does: Performs a semantic search against the vector store, returning documents most similar to the query based on embedding similarity. Expected output: List of relevant documents ranked by similarity score.
Use Azure OpenAI models
When you use Azure OpenAI models with the langchain-azure-ai package, use the following endpoint format:
import os
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel
llm = AzureAIChatCompletionsModel(
endpoint="https://<resource>.openai.azure.com/openai/v1",
credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
model="gpt-4o"
)
What this snippet does: Instantiates a client configured specifically for Azure OpenAI models using the Azure OpenAI endpoint format. The endpoint parameter points to your Azure OpenAI resource, and the credential uses the API key stored in the environment variable.
References:
Debugging and troubleshooting
If you need to debug your application and understand the requests sent to models in Foundry, use the integration's debug capabilities:
First, configure logging to the desired level:
import sys
import logging
# Acquire the logger for this client library. Use 'azure' to affect both
# 'azure.core` and `azure.ai.inference' libraries.
logger = logging.getLogger("azure")
# Set the desired logging level. logging.INFO or logging.DEBUG are good options.
logger.setLevel(logging.DEBUG)
# Direct logging output to stdout:
handler = logging.StreamHandler(stream=sys.stdout)
# Or direct logging output to a file:
# handler = logging.FileHandler(filename="sample.log")
logger.addHandler(handler)
# Optional: change the default logging format. Here we add a timestamp.
formatter = logging.Formatter("%(asctime)s:%(levelname)s:%(name)s:%(message)s")
handler.setFormatter(formatter)
What this snippet does: Sets up Python logging at the DEBUG level to capture detailed information about HTTP requests and responses between LangChain and the Model Inference API.
References:
To see request payloads, pass logging_enable=True in client_kwargs when instantiating the client:
import os
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel
model = AzureAIChatCompletionsModel(
endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
model="Mistral-Large-2411",
client_kwargs={"logging_enable": True},
)
What this snippet does: Creates a client with logging enabled to capture and display detailed request/response payloads, helpful for debugging API interactions.
Use the client as usual in your code.
Tracing
Use tracing in Foundry by creating a tracer. Logs are stored in Azure Application Insights and can be queried at any time using Azure Monitor or the Foundry portal. Each AI hub has an associated Azure Application Insights instance.
Get your instrumentation connection string
Tip
Because you can customize the left pane in the Microsoft Foundry portal, you might see different items than shown in these steps. If you don't see what you're looking for, select ... More at the bottom of the left pane.
You can configure your application to send telemetry to Azure Application Insights by using either of the following methods:
Use the connection string to Azure Application Insights directly:
Go to Foundry portal and select Tracing.
Select Manage data source. In this screen, you can see the instance that is associated with the project.
Copy the value at Connection string and set it to the following variable:
import os application_insights_connection_string = "instrumentation...."
Use the Microsoft Foundry SDK and the Foundry Project endpoint:
Ensure you have the package
azure-ai-projectsinstalled in your environment.Go to Foundry portal.
Copy your Foundry project endpoint URL and set it in the following code:
from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential project_client = AIProjectClient( credential=DefaultAzureCredential(), endpoint="<your-foundry-project-endpoint-url>", ) application_insights_connection_string = project_client.telemetry.get_application_insights_connection_string()
Configure tracing for Foundry
The following code creates a tracer connected to the Azure Application Insights behind a Foundry project. The enable_content_recording parameter is set to True, which captures inputs and outputs across the application, including intermediate steps. This feature is helpful when debugging and building applications, but you might want to disable it in production environments. You can also control this feature by using the AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED environment variable:
from langchain_azure_ai.callbacks.tracers import AzureAIOpenTelemetryTracer
azure_tracer = AzureAIOpenTelemetryTracer(
connection_string=application_insights_connection_string,
enable_content_recording=True,
)
Pass the tracer through config in the invoke operation:
chain.invoke({"topic": "living in a foreign country/region"}, config={"callbacks": [azure_tracer]})
To configure the chain itself for tracing, use the .with_config() method:
chain = chain.with_config({"callbacks": [azure_tracer]})
Then use the invoke() method as usual:
chain.invoke({"topic": "living in a foreign country"})
View traces
To see traces:
Sign in to Microsoft Foundry. Make sure the New Foundry toggle is off. These steps refer to Foundry (classic).
Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).
Go to the Tracing section.
Find the trace you created. It might take a few seconds to appear.
Sign in to Microsoft Foundry. Make sure the New Foundry toggle is off. These steps refer to Foundry (classic).
Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).
Go to the Tracing section.
Find the trace you created. It might take a couple of seconds for the trace to show.
Learn more about how to visualize and manage traces.