Edit

Share via


Develop applications with LangChain and Microsoft Foundry

Note

This document refers to the Microsoft Foundry (classic) portal.

🔄 Switch to the Microsoft Foundry (new) documentation if you're using the new portal.

Note

This document refers to the Microsoft Foundry (new) portal.

LangChain is a developer ecosystem that makes it easier to build reasoning applications. It includes multiple components, and most of them can be used independently, so you can pick and choose the pieces you need.

You can use models deployed to Microsoft Foundry with LangChain in two ways:

You can use models deployed to Microsoft Foundry with LangChain in two ways:

  • Use the model provider's API: Some models, such as OpenAI, Cohere, or Mistral, offer their own APIs and LangChain extensions. These extensions might include model-specific capabilities and are suitable if you need to use them. Install the extension for your chosen model, such as langchain-openai or langchain-cohere.

  • Use the Azure AI Model Inference API: All models deployed in Microsoft Foundry support the Model Inference API, which offers a common set of capabilities across most models in the catalog. Because the API is consistent, switching models is as simple as changing the deployment; no code changes are required. For LangChain, also install the langchain-azure-ai integration.

Important

If you're currently using an Azure AI Inference beta SDK with Microsoft Foundry Models or Azure OpenAI service, we strongly recommend that you transition to the generally available OpenAI/v1 API, which uses an OpenAI stable SDK.

For more information on how to migrate to the OpenAI/v1 API by using an SDK in your programming language of choice, see Migrate from Azure AI Inference SDK to OpenAI SDK.

This tutorial shows how to use the langchain-azure-ai package with LangChain.

Prerequisites

To run this tutorial, you need:

  • Required role:
    • Owner or Contributor on the Foundry resource or AI Hub to deploy models
    • Azure AI User to use the model in a Foundry project
    • Azure AI Developer to use the model in a hub-based project
  • Required role:
    • Owner or Contributor on the Foundry resource to deploy models
    • Azure AI User to use the model in a Foundry project
  • A model deployment that supports the Model Inference API. This article uses Mistral-Large-2411 in code examples, but this model is deprecated. Deploy a more recent Mistral model such as Mistral-Large-3 or Mistral-Nemo from the Foundry model catalog instead, and substitute your model name in the code examples.

  • Python 3.9 or later installed, including pip.

  • LangChain installed. You can install it by using the following command:

    pip install langchain
    
  • Install the Foundry integration:

    pip install -U langchain-azure-ai
    

Configure the environment

To use LLMs deployed in Microsoft Foundry portal, you need the endpoint and credentials to connect to it. Follow these steps to get the information you need from the model you want to use:

Tip

Because you can customize the left pane in the Microsoft Foundry portal, you might see different items than shown in these steps. If you don't see what you're looking for, select ... More at the bottom of the left pane.

  1. Sign in to Microsoft Foundry. Make sure the New Foundry toggle is off. These steps refer to Foundry (classic).

  2. Open the project where the model is deployed, if it isn't already open.

  3. Go to Models + endpoints and select the model you deployed as indicated in the prerequisites.

  4. Copy the endpoint URL and the key.

    Screenshot of the option to copy endpoint URI and keys from an endpoint.

    Tip

    If your model was deployed with Microsoft Entra ID support, you don't need a key.

In this scenario, set the endpoint URL and key as environment variables. (If the endpoint you copied includes additional text after /models, remove it so the URL ends at /models as shown below.)

export AZURE_INFERENCE_ENDPOINT="https://<resource>.services.ai.azure.com/models"
export AZURE_INFERENCE_CREDENTIAL="<your-key-goes-here>"

To use LLMs deployed in Microsoft Foundry portal, you need the endpoint and credentials to connect to it. Follow these steps to get the information you need from the model you want to use:

  1. Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).

  2. Open the project where the model is deployed, if it isn't already open.

  3. Copy the endpoint URL and the key.

In this scenario, set the endpoint URL and key as environment variables. (If the endpoint you copied includes additional text after /models, remove it so the URL ends at /models as shown below.)

export AZURE_INFERENCE_ENDPOINT="https://<resource>.services.ai.azure.com/models"
export AZURE_INFERENCE_CREDENTIAL="<your-key-goes-here>"

Create a client to connect to the chat model by using the AzureAIChatCompletionsModel class.

import os
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

model = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
    model="Mistral-Large-2411",
)

What this snippet does: Instantiates an AzureAIChatCompletionsModel client configured to connect to your deployed model using an API key for authentication. This client acts as an interface to the Model Inference API.

References:

Caution

Breaking change: The model_name parameter is renamed to model in version 0.1.3.

If your endpoint supports Microsoft Entra ID, use the following code to create the client:

import os
from azure.identity import DefaultAzureCredential
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

model = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=DefaultAzureCredential(),
    model="Mistral-Large-2411",
)

Note

When using Microsoft Entra ID, make sure that the endpoint is deployed with that authentication method and that you have the required permissions to invoke it.

If you plan to use asynchronous calls, use the asynchronous version of the credentials:

from azure.identity.aio import (
    DefaultAzureCredential as DefaultAzureCredentialAsync,
)
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

model = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=DefaultAzureCredentialAsync(),
    model="Mistral-Large-2411",
)

If your endpoint serves a single model (for example, serverless API deployments), don't include the model parameter:

import os
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

model = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
)

Verify your setup

Test your client connection with a simple invocation:

response = model.invoke("Say hello")
print(response.content)

What this does: Calls the model with a simple prompt to verify authentication and connectivity. Expected output: A greeting message from the model (for example, "Hello! How can I assist you today?").

References:

Use chat completion models

Use the model directly. ChatModels are instances of the LangChain Runnable interface, which provides a standard way to interact with them. To call the model, pass a list of messages to the invoke method.

from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="Translate the following from English into Italian"),
    HumanMessage(content="hi!"),
]

model.invoke(messages)

What this snippet does: Demonstrates how to pass a list of HumanMessage and SystemMessage objects to the model's invoke() method to generate a response.

References:

Compose operations as needed in chains. Use a prompt template to translate sentences:

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

system_template = "Translate the following into {language}:"
prompt_template = ChatPromptTemplate.from_messages(
    [("system", system_template), ("user", "{text}")]
)

This chain takes language and text inputs. Now, create an output parser:

from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

What this snippet does: Creates a StrOutputParser that converts the model's output into a string format, stripping any extra metadata.

References:

Combine the template, model, and output parser by using the pipe (|) operator:

chain = prompt_template | model | parser

Invoke the chain by providing language and text values by using the invoke method:

chain.invoke({"language": "italian", "text": "hi"})

Chain multiple LLMs together

Because models in Foundry expose a common Model Inference API, you can chain multiple LLM operations and choose the model best suited to each step.

In the following example, you create two model clients: one producer and one verifier. To make the distinction clear, use a multi-model endpoint such as the Model Inference API and pass the model parameter to a large model for generation and a small model for verification. Producing content generally requires a larger model, while verification can use a smaller one.

from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

producer = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
    model="Mistral-Large-2411",
)

verifier = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
    model="mistral-small",
)

What this snippet does: Instantiates two separate AzureAIChatCompletionsModel clients: one using Mistral-Large-2411 for content generation and another using Mistral-Small for verification, demonstrating how to choose different models for different tasks.

References:

Tip

Review the model card for each model to understand the best use cases.

The following example generates a poem written by an urban poet:

from langchain_core.prompts import PromptTemplate

producer_template = PromptTemplate(
    template="You are an urban poet, your job is to come up \
             verses based on a given topic.\n\
             Here is the topic you have been asked to generate a verse on:\n\
             {topic}",
    input_variables=["topic"],
)

verifier_template = PromptTemplate(
    template="You are a verifier of poems, you are tasked\
              to inspect the verses of poem. If they consist of violence and abusive language\
              report it. Your response should be only one word either True or False.\n \
              Here is the lyrics submitted to you:\n\
              {input}",
    input_variables=["input"],
)

What this snippet does: Creates a prompt template and chains it with the producer model client to generate creative content (a poem in this case).

Chain the pieces:

chain = producer_template | producer | parser | verifier_template | verifier | parser

What this snippet does: Chains the generated poem through the verifier model to validate or review the generated content, demonstrating a producer-verifier workflow.

The previous chain returns only the output of the verifier step. To access the intermediate result generated by the producer, use a RunnablePassthrough to output that intermediate step.

from langchain_core.runnables import RunnablePassthrough, RunnableParallel

generate_poem = producer_template | producer | parser
verify_poem = verifier_template | verifier | parser

chain = generate_poem | RunnableParallel(poem=RunnablePassthrough(), verification=RunnablePassthrough() | verify_poem)

Invoke the chain by using the invoke method:

chain.invoke({"topic": "living in a foreign country"})

What this snippet does: Calls the complete producer-verifier chain with a topic input, returning both the generated content and the verification result. Expected output: A JSON object containing poem and verification keys with the generated poem and verification response.

References:

Use embedding models

Create an embeddings client in a similar way. Set the environment variables to point to an embeddings model:

export AZURE_INFERENCE_ENDPOINT="<your-model-endpoint-goes-here>"
export AZURE_INFERENCE_CREDENTIAL="<your-key-goes-here>"

Create the client:

import os
from langchain_azure_ai.embeddings import AzureAIEmbeddingsModel

embed_model = AzureAIEmbeddingsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
    model="text-embedding-3-large",
)

What this snippet does: Instantiates an embeddings client using AzureAIEmbeddingsModel to convert text into vector embeddings, which can be used for semantic search and similarity comparisons.

References:

Use an in-memory vector store:

from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embed_model)

What this snippet does: Creates an in-memory vector store (InMemoryVectorStore) that stores embeddings for fast similarity search operations.

References:

Add documents:

from langchain_core.documents import Document

document_1 = Document(id="1", page_content="foo", metadata={"baz": "bar"})
document_2 = Document(id="2", page_content="thud", metadata={"bar": "baz"})

documents = [document_1, document_2]
vector_store.add_documents(documents=documents)

What this snippet does: Converts documents into embeddings using the embeddings client and adds them to the vector store for later retrieval.

References:

Search by similarity:

results = vector_store.similarity_search(query="thud", k=1)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

What this snippet does: Performs a semantic search against the vector store, returning documents most similar to the query based on embedding similarity. Expected output: List of relevant documents ranked by similarity score.

Use Azure OpenAI models

When you use Azure OpenAI models with the langchain-azure-ai package, use the following endpoint format:

import os
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

llm = AzureAIChatCompletionsModel(
    endpoint="https://<resource>.openai.azure.com/openai/v1",
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
    model="gpt-4o"
)

What this snippet does: Instantiates a client configured specifically for Azure OpenAI models using the Azure OpenAI endpoint format. The endpoint parameter points to your Azure OpenAI resource, and the credential uses the API key stored in the environment variable.

References:

Debugging and troubleshooting

If you need to debug your application and understand the requests sent to models in Foundry, use the integration's debug capabilities:

First, configure logging to the desired level:

import sys
import logging

# Acquire the logger for this client library. Use 'azure' to affect both
# 'azure.core` and `azure.ai.inference' libraries.
logger = logging.getLogger("azure")

# Set the desired logging level. logging.INFO or logging.DEBUG are good options.
logger.setLevel(logging.DEBUG)

# Direct logging output to stdout:
handler = logging.StreamHandler(stream=sys.stdout)
# Or direct logging output to a file:
# handler = logging.FileHandler(filename="sample.log")
logger.addHandler(handler)

# Optional: change the default logging format. Here we add a timestamp.
formatter = logging.Formatter("%(asctime)s:%(levelname)s:%(name)s:%(message)s")
handler.setFormatter(formatter)

What this snippet does: Sets up Python logging at the DEBUG level to capture detailed information about HTTP requests and responses between LangChain and the Model Inference API.

References:

To see request payloads, pass logging_enable=True in client_kwargs when instantiating the client:

import os
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

model = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
    model="Mistral-Large-2411",
    client_kwargs={"logging_enable": True},
)

What this snippet does: Creates a client with logging enabled to capture and display detailed request/response payloads, helpful for debugging API interactions.

Use the client as usual in your code.

Tracing

Use tracing in Foundry by creating a tracer. Logs are stored in Azure Application Insights and can be queried at any time using Azure Monitor or the Foundry portal. Each AI hub has an associated Azure Application Insights instance.

Get your instrumentation connection string

Tip

Because you can customize the left pane in the Microsoft Foundry portal, you might see different items than shown in these steps. If you don't see what you're looking for, select ... More at the bottom of the left pane.

You can configure your application to send telemetry to Azure Application Insights by using either of the following methods:

  1. Use the connection string to Azure Application Insights directly:

    1. Go to Foundry portal and select Tracing.

    2. Select Manage data source. In this screen, you can see the instance that is associated with the project.

    3. Copy the value at Connection string and set it to the following variable:

      import os
      
      application_insights_connection_string = "instrumentation...."
      
  2. Use the Microsoft Foundry SDK and the Foundry Project endpoint:

    1. Ensure you have the package azure-ai-projects installed in your environment.

    2. Go to Foundry portal.

    3. Copy your Foundry project endpoint URL and set it in the following code:

      from azure.ai.projects import AIProjectClient
      from azure.identity import DefaultAzureCredential
      
      project_client = AIProjectClient(
          credential=DefaultAzureCredential(),
          endpoint="<your-foundry-project-endpoint-url>",
      )
      
      application_insights_connection_string = project_client.telemetry.get_application_insights_connection_string()
      

Configure tracing for Foundry

The following code creates a tracer connected to the Azure Application Insights behind a Foundry project. The enable_content_recording parameter is set to True, which captures inputs and outputs across the application, including intermediate steps. This feature is helpful when debugging and building applications, but you might want to disable it in production environments. You can also control this feature by using the AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED environment variable:

from langchain_azure_ai.callbacks.tracers import AzureAIOpenTelemetryTracer

azure_tracer = AzureAIOpenTelemetryTracer(
    connection_string=application_insights_connection_string,
    enable_content_recording=True,
)

Pass the tracer through config in the invoke operation:

chain.invoke({"topic": "living in a foreign country/region"}, config={"callbacks": [azure_tracer]})

To configure the chain itself for tracing, use the .with_config() method:

chain = chain.with_config({"callbacks": [azure_tracer]})

Then use the invoke() method as usual:

chain.invoke({"topic": "living in a foreign country"})

View traces

To see traces:

  1. Sign in to Microsoft Foundry. Make sure the New Foundry toggle is off. These steps refer to Foundry (classic).

    Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).

  2. Go to the Tracing section.

  3. Find the trace you created. It might take a few seconds to appear.

    A screenshot showing the trace of a chain.

  1. Sign in to Microsoft Foundry. Make sure the New Foundry toggle is off. These steps refer to Foundry (classic).

    Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).

  2. Go to the Tracing section.

  3. Find the trace you created. It might take a couple of seconds for the trace to show.

Learn more about how to visualize and manage traces.

Next steps